AI-generated summaries

Today's ML research,
without the noise.

Daily summaries of the latest machine learning papers from arXiv, processed every 8 hours.

48 Papers today
8h Update frequency
7 Days of history
Probabilistic Circuits for Irregular Multivariate Time Series Forecasting
Christian Klötergens, Vijaya Krishna Yalavarthi, Lars Schmidt-Thieme
Time Series
  • CircuITS is a novel architecture that guarantees marginalization consistency in IMTS forecasting.
  • The model effectively captures intricate dependencies between time series channels.
  • Extensive experiments show that CircuITS outperforms existing models in joint and marginal density estimation.
  • The architecture is designed to handle irregular data and generate accurate forecasting queries.
Read more
Toward Scalable SDN for LEO Mega-Constellations: A Graph Learning Approach
Sivaram Krishnan, Bassel Al Homssi, Zhouyou Gu, Jihong Park, Sung-Min Oh, Jinho Choi
Graph Learning Optimization Theory
  • Introduction of a scalable SDN framework for LEO mega-constellations.
  • Utilization of GNNs to represent constellation topology and Koopman theory for linearizing dynamics.
  • Development of the Graph Koopman Autoencoder (GKAE) for forecasting spatio-temporal behavior.
  • Demonstrated improvements in spatial compression (42.8%) and temporal forecasting (10.81%) over existing methods.
Read more
Bayesian policy gradient and actor-critic algorithms
Mohammad Ghavamzadeh, Yaakov Engel, Michal Valko
Reinforcement Learning Theory Robotics
  • Introduces a Bayesian framework for policy gradient methods to reduce sample variance.
  • Models policy gradients as Gaussian processes, providing uncertainty estimates.
  • Develops a new actor-critic model using Bayesian non-parametric critics.
  • Demonstrates improved performance over traditional Monte-Carlo based methods.
Read more
Deep Kernel Learning for Stratifying Glaucoma Trajectories
Bruce Rushing, Angela Danquah, Alireza Namazi, Arjun Dirghangi, Heman Shakeri
Time Series NLP Multimodal
  • Introduces a hybrid architecture combining clinical-BERT embeddings with a DKL algorithm for predicting glaucoma patient trajectories.
  • Successfully identifies three clinically distinct patient subgroups based on risk trajectories rather than current disease state.
  • Achieves improved predictive performance compared to standard time-series forecasting methods.
  • Provides calibrated uncertainty quantification to aid in clinical decision-making.
Read more
Budget Constraints as Riemannian Manifolds
Michael Helcig, Dan Alistarh
Optimization Efficient ML Theory
  • Introduction of the budget manifold as a smooth Riemannian submanifold in logit space.
  • Development of Riemannian Constrained Optimization (RCO) that enforces budget constraints without hyperparameters.
  • Demonstration of the method's effectiveness on synthetic knapsack problems and LLM compression tasks.
  • RCO achieves optimal solutions where traditional methods fail, particularly in high-compression scenarios.
Read more
Cross-Subject Generalization for EEG Decoding: A Survey of Deep Learning Methods
Taida Li, Yujun Yan, Fei Dou, Wenzhan Song, Xiang Zhang
Time Series
  • High inter-subject variability in EEG signals poses a significant challenge for deep learning models.
  • The survey categorizes methodologies into four main families: feature alignment, adversarial learning, feature disentanglement, and contrastive learning.
  • A rigorous evaluation framework is essential for assessing the effectiveness of cross-subject generalization methods.
  • The paper emphasizes the need to utilize subject-level information to improve model robustness and generalizability.
Read more
Statistical Channel Fingerprint Construction for Massive MIMO: A Unified Tensor Learning Framework
Zhenzhou Jin, Li You, Xiang-Gen Xia, Xiqi Gao
Theory Optimization Efficient ML
  • Introduction of statistical channel fingerprints (sCF) for massive MIMO systems.
  • Unified tensor representation and dimension reduction of sCF using eigenvalue decomposition.
  • Development of LPWTNet architecture for efficient inference and multi-scale frequency capture.
  • Implementation of shared mask learning for adaptive refinement of sCF components.
Read more
Bayesian Optimization in Linear Time
Jesse Schneider, William J. Welch
Optimization
  • TreeBO reduces the computational complexity of Bayesian optimization from cubic to linear.
  • The method improves the balance between local and global modeling of objective functions.
  • Empirical results show superior optimization performance on seven test functions.
  • TreeBO is simpler to tune than existing partitioning methods, requiring only one additional hyperparameter.
Read more
People-Centred Medical Image Analysis
Zheng Zhang, Milad Masroor, Cuong Nguyen, Tahir Hassan, Yuanhong Chen, David Rosewarne, Kevin Wells, Thanh-Toan Do, Gustavo Carneiro
Computer Vision
  • PecMan framework integrates AI fairness, L2D, and L2C to improve diagnostic accuracy and equity.
  • Introduces the FairHAI benchmark for evaluating AI systems based on accuracy, fairness, and clinician workload.
  • Demonstrates that addressing fairness and workflow integration together leads to better clinical outcomes.
  • Experimental results show PecMan outperforms existing isolated approaches in medical image analysis.
Read more
AMGenC: Generating Charge Balanced Amorphous Materials
Yan Lin, Jilin Hu, N. M. Anoop Krishnan, Morten M. Smedskjaer
Generative Models Optimization
  • AMGENC guarantees the generation of charge balanced amorphous materials.
  • The method introduces a novel approach combining element noise, soft projections, and discrete projections.
  • AMGENC reduces the time to obtain charge balanced samples by up to two orders of magnitude compared to existing methods.
  • Extensive experiments validate the effectiveness and accuracy of AMGENC across multiple configurations.
Read more
Pessimism-Free Offline Learning in General-Sum Games via KL Regularization
Claire Chen, Yuheng Zhang
Reinforcement Learning Theory Optimization
  • KL regularization serves as an effective alternative to explicit pessimism in offline learning.
  • The General-sum Anchored Nash Equilibrium (GANE) achieves an accelerated statistical rate of O(1/n) for Nash equilibria.
  • The General-sum Anchored Mirror Descent (GAMD) algorithm provides a computationally efficient method for recovering Coarse Correlated Equilibria.
  • The proposed methods eliminate the need for complex hyperparameter tuning associated with traditional pessimistic approaches.
Read more
Model-Based Reinforcement Learning with Double Oracle Efficiency in Policy Optimization and Offline Estimation
Haichen Hu, Jian Qian, David Simchi-Levi
Reinforcement Learning Theory Efficient ML
  • Introduces a novel algorithm for offline oracle-efficient episodic reinforcement learning.
  • Achieves optimal regret bounds with significantly reduced oracle call complexity.
  • Generalizes the approach to linear MDPs with infinite state and action spaces.
  • Demonstrates the first doubly oracle-efficient regret minimization algorithm for MDPs.
Read more
A Comparative Study of QSPR Methods on a Unique Multitask PAMPA dataset
Andrs Formanek, Anna Vincze, Richrd Bicsak, Yves Moreau, Gyorgy T. Balogh, Adam Arany
Theory Interpretability
  • Introduces a unique multitask dataset of 143 drug molecules evaluated across six PAMPA setups.
  • Compares various QSPR methods, highlighting the effectiveness of traditional descriptors over deep learning models for small datasets.
  • Focuses on the balance between predictive accuracy and model interpretability in drug permeability predictions.
  • Provides novel insights into membrane-specific permeability profiles, aiding in drug discovery processes.
Read more
Weisfeiler Lehman Test on Combinatorial Complexes: Generalized Expressive Power of Topological Neural Networks
Jiawen Chen, Qi Shao, Duxin Chen, Wenwu Yu
Graph Learning Theory Efficient ML
  • Introduction of the Combinatorial Complex Weisfeiler-Lehman (CCWL) test for topological neural networks.
  • Establishment of a unified theoretical framework for topological message passing across various combinatorial structures.
  • Proof that upper and lower neighborhood relations suffice for full expressivity in the CCWL framework.
  • Development of the Combinatorial Complex Isomorphism Network (CCIN) that outperforms existing methods.
Read more
Learning Rate Transfer in Normalized Transformers
Boris Shigida, Boris Hanin, Andrey Gromov
Optimization Theory Efficient ML
  • Introduction of νGPT, a novel parameterization for Normalized Transformers.
  • Demonstrates effective learning rate transfer across model width, depth, and token horizon.
  • Empirical validation shows no performance loss compared to the original nGPT.
  • Utilizes alignment exponents to refine hyperparameter transfer techniques.
Read more
Co-Evolving Policy Distillation
Naibin Gu, Chenxu Yang, Qingyi Si, Chuanyu Qin, Dingyu Yao, Peng Fu, Zheng Lin, Weiping Wang, Nan Duan, Jiaqi Wang
Reinforcement Learning Multimodal
  • Identifies limitations of traditional RLVR and OPD methods due to behavioral distance between teacher and student models.
  • Proposes CoPD, which interleaves RLVR and mutual OPD for continuous co-evolution of expert models.
  • Demonstrates that CoPD outperforms existing methods in multimodal reasoning tasks.
  • Establishes that maintaining behavioral proximity enhances knowledge transfer during training.
Read more
BoostLoRA: Growing Effective Rank by Boosting Adapters
Raviteja Anantha, Nick Levato, Layne C. Price
NLP Large Language Models Efficient ML
  • BoostLoRA allows for linear growth of effective rank through iterative training and merging of ultra-low-parameter adapters.
  • The ROTATE SVD basis strategy ensures that each adapter operates in an orthogonal subspace, enhancing model expressivity.
  • BoostLoRA achieves superior performance on benchmarks like GSM8K and MATH-500 compared to existing methods.
  • The framework maintains zero inference overhead by discarding merged adapters after training.
Read more
Information-Theoretic Generalization Bounds for Stochastic Gradient Descent with Predictable Virtual Noise
Mohammad Partohaghighi
Theory Optimization
  • Introduces history-adaptive virtual perturbations for SGD analysis.
  • Replaces fixed perturbation geometries with adaptive covariances based on past optimization history.
  • Establishes information-theoretic generalization bounds that account for dynamic optimization processes.
  • Demonstrates that the framework can recover existing bounds as special cases.
Read more
Online semi-supervised perception: Real-time learning without explicit feedback
Branislav Kveton, Michal Valko, Matthai Phillipose, Ling Huang
Computer Vision Graph Learning Theory
  • Proposes a novel algorithm for real-time learning without explicit feedback.
  • Combines semi-supervised learning on graphs with online learning techniques.
  • Demonstrates superior performance in real-time face recognition tasks.
  • Establishes a regret bound for the quality of solutions provided by the algorithm.
Read more
Trading off rewards and errors in multi-armed bandits
Akram Erraqabi, Alessandro Lazaric, Michal Valko, Emma Brunskill, Yun-En Liu
Reinforcement Learning Theory Optimization
  • Introduces a new objective function for balancing rewards and estimation errors in MAB settings.
  • Develops the ForcingBalance algorithm, which optimizes the proposed objective function.
  • Proves that ForcingBalance achieves asymptotic regret rates comparable to the best strategies for both cumulative reward and active exploration.
  • Demonstrates the algorithm's effectiveness through empirical simulations on educational data.
Read more
Scalable Context-Aware Graph Attention for Unsupervised Anomaly Detection in Large-Scale Mobile Networks
Sara Malacarne, Eirik Hoel-Høiseth, Erlend Aune, David Zsolt Biró, Massimiliano Ruocco
Graph Learning Time Series
  • Introduction of C-MTAD-GAT, a centralized context-aware anomaly detection system for telecom networks.
  • Development of a domain-agnostic calibration protocol based solely on validation errors.
  • Validation across multiple datasets, including TELCO, RAN, and EPC control-plane data.
  • Demonstration of scalability and stability in anomaly detection as the number of network elements increases.
Read more
Physical Foundation Models: Fixed hardware implementations of large-scale neural networks
Logan G Wright, Tianyu Wang, Tatsuhiro Onodera, Peter L. McMahon
Efficient ML Large Language Models Theory
  • PFMs could significantly reduce energy consumption and improve performance for large-scale AI models.
  • The paper advocates for hardware implementations that utilize the physical properties of materials for computation.
  • PFMs may enable the deployment of AI models with parameter counts reaching 10^18.
  • The authors discuss the challenges and open questions in realizing PFMs in practical applications.
Read more
AdaBFL: Multi-Layer Defensive Adaptive Aggregation for Bzantine-Robust Federated Learning
Zehui Tang, Yuchen Liu, Feihu Huang
Federated Learning
  • Introduction of AdaBFL, a multi-layer defensive aggregation method for Byzantine-robust federated learning.
  • Development of a three-layer defense mechanism that adapts to different types of poisoning attacks.
  • Theoretical proof of convergence for AdaBFL under non-convex settings with non-iid data.
  • Extensive experimental validation demonstrating AdaBFL's effectiveness compared to existing methods.
Read more
Optimal Spatio-Temporal Decoupling for Bayesian Conformal Prediction
Yu-Hsueh Fang, Chia-Yen Lee
Time Series Theory Optimization
  • Introduction of State-Adaptive Bayesian Conformal Prediction (SA-BCP) framework.
  • SA-BCP effectively decouples temporal and spatial components for improved prediction intervals.
  • Rigorous theoretical analysis establishes a minimax bias-variance tradeoff.
  • Empirical results show significant improvements in minimizing under-coverage and interval bloat.
Read more
Learning from a single labeled face and a stream of unlabeled data
Branislav Kveton, Michal Valko
Computer Vision
  • Introduces Online Manifold Tracking (OMT) for face recognition from a single labeled image and unlabeled data.
  • Frames the problem as one-class classification, addressing the lack of negative examples.
  • Achieves 90% identification accuracy with nearly zero false positives, outperforming existing methods.
  • Demonstrates real-time performance with an average recognition time of 0.05 seconds.
Read more
AutoSP: Unlocking Long-Context LLM Training Via Compiler-Based Sequence Parallelism
Ahan Gupta, Zhihao Wang, Neel Dani, Masahiro Tanaka, Olatunji Ruwase, Minjia Zhang
Large Language Models NLP Efficient ML
  • AutoSP is the first automated solution for optimizing LLM training for long-context tasks.
  • It integrates sequence parallelism and activation-checkpointing into a PyTorch-native compilation framework.
  • The method significantly increases the maximum input context length for LLMs without compromising training speed.
  • AutoSP is compatible with both NVIDIA and AMD hardware, demonstrating versatility in application.
Read more
An adaptive wavelet-based PINN for problems with localized high-magnitude source
Himanshu Pandey, Ratikanta Behera
Theory Optimization Efficient ML
  • AW-PINN effectively addresses loss imbalance in PINNs for PDEs with localized high-magnitude sources.
  • The framework adapts wavelet basis functions dynamically, improving efficiency and accuracy.
  • AW-PINN does not require automatic differentiation, accelerating the training process.
  • The method shows superior performance on various PDEs compared to existing techniques.
Read more
Batch Normalization for Neural Networks on Complex Domains
Xuan Son Nguyen, Nistor Grozavu
Theory
  • Introduction of batch normalization layers for neural networks on complex domains.
  • Focus on less-studied complex domains like the Siegel disk and complex unit ball.
  • Demonstrated improvements in training stability and accuracy in various machine learning tasks.
  • Connection to existing Riemannian batch normalization layers.
Read more
AirFM-DDA: Air-Interface Foundation Model in the Delay-Doppler-Angle Domain for AI-Native 6G
Kejia Bian, Meixia Tao, Jianhua Mo, Zhiyong Chen, Leyan Chen
Optimization Efficient ML Theory
  • AirFM-DDA operates in the Delay-Doppler-Angle domain, improving the representation of multipath components.
  • The model utilizes a window-based attention mechanism to reduce computational complexity.
  • AirFM-DDA achieves superior zero-shot generalization and outperforms existing models in channel-related tasks.
  • The model demonstrates robustness under high mobility and severe noise conditions.
Read more
Cost-Aware Learning
Clara Mohri, Amir Globerson, Haim Kaplan, Tomer Koren, Yishay Mansour
Reinforcement Learning Large Language Models Optimization
  • Introduction of Cost-Aware Learning framework for machine learning with varying sample costs.
  • Development of Cost-Aware SGD algorithm with theoretical guarantees on cost and error.
  • Proposal of Cost-Aware GRPO for efficient policy optimization in reinforcement learning.
  • Empirical results indicate significant reductions in training costs while preserving model performance.
Read more
Preserving Temporal Dynamics in Time Series Generation
Ci Lin, Futong Li, Tet Yeap, Iluju Kiringa
Generative Models Time Series
  • Proposes an MCMC-based framework to preserve temporal dynamics in synthetic time series generation.
  • Highlights the limitations of existing GAN approaches that focus on marginal distribution matching.
  • Demonstrates that the MCMC framework improves temporal fidelity and predictive performance across multiple datasets.
  • Provides a theoretical analysis of distribution shift in autoregressive generation.
Read more
Context-Aware Graph Attention for Unsupervised Telco Anomaly Detection
Sara Malacarne, Eirik Hoel-Høiseth, Erlend Aune, David Zsolt Biro, Massimiliano Ruocco
Graph Learning Time Series Efficient ML
  • C-MTAD-GAT is a context-aware, unsupervised anomaly detection model tailored for mobile network KPIs.
  • The model combines graph attention with context embeddings to effectively handle multivariate time series data.
  • Detection thresholds are calibrated without labeled data, maintaining a fully unsupervised pipeline.
  • C-MTAD-GAT outperforms existing models in both precision and recall while minimizing false alarms.
Read more
ChipLingo: A Systematic Training Framework for Large Language Models in EDA
Lei Li, Xingwen Yu, Jianguo Ni, Junxuan Zhu, Jieqiong Zhang, Jian Zhao, Zhi Liu
Large Language Models NLP
  • ChipLingo provides a systematic training pipeline for adapting LLMs to the EDA domain.
  • The framework includes data curation, domain-adaptive pretraining, and RAG scenario training.
  • Experimental results demonstrate significant performance improvements over baseline models.
  • The study highlights the importance of QA augmentation and specific training strategies for domain adaptation.
Read more
ResRL: Boosting LLM Reasoning via Negative Sample Projection Residual Reinforcement Learning
Zihan Lin, Xiaohan Wang, Jie Cao, Jiajun Chai, Li Wang, Xiaodong Lu, Wei Lin, Ran He, Guojun Yin
NLP Large Language Models Reinforcement Learning
  • ResRL effectively decouples the semantic distributions of positive and negative responses to enhance reasoning diversity.
  • The framework introduces a theoretical connection between Lazy Likelihood Displacement and gradient interference, providing a new proxy for gradient updates.
  • Empirical results show ResRL surpasses existing methods like NSR and GRPO in various reasoning tasks.
  • The method employs low-rank approximation for computational efficiency while maintaining performance.
Read more
Federated Learning with Hypergradient-based Online Update of Aggregation Weights
Ayano Nakai-Kasai, Tadashi Wadayama
Federated Learning
  • Introduction of FedHAW for online aggregation weight updates in federated learning.
  • Utilization of hypergradient descent for efficient adaptation to heterogeneous data and communication environments.
  • Elimination of the need for additional training data compared to existing methods like FedLAW.
  • Demonstration of high generalization performance and robustness to communication errors through simulations.
Read more
Towards Robust and Scalable Density-based Clustering via Graph Propagation
Yingtao Zheng, Hugo Phibbs, Ninh Pham
Graph Learning Efficient ML Theory
  • CluProp redefines density-based clustering as a graph propagation process, improving scalability and robustness.
  • The DANE algorithm enables efficient label propagation from local density peaks, enhancing clustering in heterogeneous datasets.
  • CluProp achieves superior performance on large-scale datasets, processing millions of points quickly while maintaining high accuracy.
Read more
Learning from Disagreement: Clinician Overrides as Implicit Preference Signals for Clinical AI in Value-Based Care
Prabhjot Singh, Abhishek Gupta, Chris Betz, Abe Flansburg, Brett Ives, Sudeep Lama, Jung Hoon Son
Reinforcement Learning Theory Optimization
  • Overrides should be viewed as implicit preference signals rather than compliance failures.
  • A dual learning architecture is proposed to train both reward and capability models simultaneously.
  • Override data in chronic disease management has unique properties that enhance preference learning.
  • Clinician capability significantly influences decision-making and should be factored into AI training.
Read more
Fair Dataset Distillation via Cross-Group Barycenter Alignment
Mohammad Hossein Moslemi, Nima Hosseini Dashtbayaz, Zhimin Mei, Boyu Wang, Bissan Ghaddar
Theory Optimization
  • Bias amplification in dataset distillation arises from the interaction between group imbalance and representational separation.
  • COBRA framework introduces a barycenter alignment approach to ensure fair representation across demographic groups.
  • The proposed method is compatible with existing dataset distillation techniques.
  • Empirical results show significant fairness improvements across various datasets and distillation methods.
Read more
A Dirac-Frenkel-Onsager principle: Instantaneous residual minimization with gauge momentum for nonlinear parametrizations of PDE solutions
Matteo Raviola, Benjamin Peherstorfer
Optimization Theory
  • Introduces the Dirac-Frenkel-Onsager principle to address non-uniqueness in parameter dynamics.
  • Utilizes a history variable as momentum to promote smooth parameter evolution.
  • Maintains the instantaneous residual minimization property of the Dirac-Frenkel principle.
  • Demonstrates increased robustness in singular and near-singular regimes.
Read more
Anomaly Detection in Soil Heavy Metal Contamination Using Unsupervised Learning for Environmental Risk Assessment
Isaac Tettey Adjokatse, Samuel Senyo Koranteng, George Yamoah Afrifa, Theophilus Ansah-Narh, Marcellin Atemkeng, Joseph Bremang Tandoh, Kow Ahor Essel-Yorke, Richmond Opoku-Sarkodie, Rebecca Davis
Theory
  • Unsupervised learning effectively identifies anomalous heavy metal contamination in soil.
  • Isolation Forest and PCA reconstruction error detected significant anomalies in soil samples.
  • The study found that anomalies had 70-80% higher health risk indices compared to normal samples.
  • Three distinct types of anomalies were identified, indicating varied contamination patterns.
Read more
PROMISE-AD: Progression-aware Multi-horizon Survival Estimation for Alzheimer's Disease Progression and Dynamic Tracking
Qing Lyu, Jeremy Hudson, Mohammad Kawas, Yuming Jiang, Chenyu You, Christopher T Whitlow
Time Series
  • PROMISE-AD effectively handles irregular clinical histories and missing data through progression-aware visit tokenization.
  • The framework employs a temporal Transformer to balance long-term progression history with recent clinical states.
  • It utilizes a hybrid approach combining discrete-time mixture hazards with various regularization techniques for calibrated risk estimation.
  • The model achieved the lowest integrated Brier score for CN-to-MCI conversion and the highest C-index for MCI-to-AD conversion among compared methods.
Read more
Distributional Alignment Games for Answer-Level Fine-Tuning
Mehryar Mohri, Jon Schneider, Yifan Wu
NLP Large Language Models Optimization
  • Introduces a game-theoretical framework for optimizing language models based on answer correctness.
  • Transforms the intractable marginalization problem into a tractable projection problem using Distributional Alignment Games.
  • Unifies various alignment strategies, including diversity and coherence, under a single theoretical lens.
  • Demonstrates significant complexity gains in reasoning tasks through efficient algorithms like Coherence-GRPO.
Read more
Free Energy Surface Sampling via Reduced Flow Matching
Zichen Liu, Tiejun Li
Efficient ML Theory
  • Introduction of FES-FM, a reduced flow matching method for free energy sampling.
  • Utilization of a Hessian-informed prior distribution for many-particle systems.
  • Significant reduction in computational costs while improving sampling accuracy.
  • Demonstration of the method's effectiveness across various benchmark potentials.
Read more
From Prediction to Practice: A Task-Aware Evaluation Framework for Blood Glucose Forecasting
Alireza Namazi, Heman Shakeri
Time Series
  • Standard aggregate metrics can obscure critical failures in blood glucose forecasting models.
  • The proposed task-aware evaluation framework includes both observational and interventional evaluation arms.
  • Models may perform well on average but fail in high-risk scenarios, particularly post-bolus periods.
  • The interventional evaluation reveals that many models struggle to predict the consequences of altered insulin dosing.
Read more
Meritocratic Fairness in Budgeted Combinatorial Multi-armed Bandits via Shapley Values
Shradha Sharma, Swapnil Dhamal, Shweta Jain
Federated Learning Theory Optimization
  • Introduces K-Shapley value to measure arm contributions in BCMAB-FBF settings.
  • Proposes K-SVFair-FBF algorithm that balances fairness and effective learning under full-bandit feedback.
  • Achieves a theoretical regret bound of O(T 3/4), addressing noise from both learning and Monte Carlo methods.
  • Demonstrates improved fairness and performance over existing methods in practical applications.
Read more
Binomial flows: Denoising and flow matching for discrete ordinal data
Yair Shenfeld, Ricardo Baptista, Stefano Peluchetti
Generative Models Theory Optimization
  • Introduction of Binomial flows for generative modeling of discrete ordinal data.
  • Establishment of a discrete analogue to Tweedie's formula using Binomial noise.
  • Development of a framework that allows for denoising, sampling, and exact likelihood estimation.
  • Validation of the methodology on synthetic and real-world datasets with competitive results.
Read more
Technical Report: Activation Residual Hessian Quantization (ARHQ) for Low-Bit LLM Quantization
YiFeng Wang, Zhun Sun, Keisuke Sakaguchi
Large Language Models Efficient ML Optimization
  • ARHQ effectively reduces error propagation in low-bit quantization of LLMs.
  • The method isolates error-sensitive weight directions using a residual Hessian approach.
  • ARHQ significantly improves layer-wise SNR and reasoning performance in aggressive quantization scenarios.
  • The approach is adaptable to specific quantization hardware and calibration distributions.
Read more
Unlearning Offline Stochastic Multi-Armed Bandits
Zichun Ye, Runqi Wang, Xuchuang Wang, Xutong Liu, Shuai Li, Mohammad Hajiesmaili
Reinforcement Learning Theory Efficient ML
  • Introduces the first study of unlearning in offline stochastic multi-armed bandits.
  • Formalizes privacy constraints and utility measurement in the context of unlearning.
  • Develops adaptive algorithms that combine Gaussian mechanism and rollback methods.
  • Establishes theoretical performance guarantees and lower bounds for unlearning scenarios.
Read more