AI-generated summaries

Today's ML research,
without the noise.

Daily summaries of the latest machine learning papers from arXiv, processed every 8 hours.

66 Papers today
8h Update frequency
7 Days of history
Improving Feasibility via Fast Autoencoder-Based Projections
Maria Chzhen, Priya L. Donti
Reinforcement Learning Optimization Efficient ML
  • Introduces a data-driven approach for enforcing complex operational constraints using autoencoders.
  • Develops a structured latent representation of the feasible set through adversarial training.
  • Demonstrates significant computational efficiency in correcting infeasible predictions.
  • Empirical results show near 100% feasibility in constrained optimization tasks and improved safety in reinforcement learning.
Read more
From Model-Based Screening to Data-Driven Surrogates: A Multi-Stage Workflow for Exploring Stochastic Agent-Based Models
Paul Saves, Matthieu Mastio, Nicolas Verstaevel, Benoit Gaudou
Theory
  • Proposes a multi-stage workflow for exploring stochastic Agent-Based Models (ABMs).
  • Integrates automated model-based screening with machine learning surrogates.
  • Demonstrates methodology using a predator-prey case study.
  • Automates the discovery of unstable regions in parameter space.
Read more
Homophily-aware Supervised Contrastive Counterfactual Augmented Fair Graph Neural Network
Mahdi Tavassoli Kejani, Fadi Dornaika, Charlotte Laclau, Jean-Michel Loubes
Graph Learning
  • Introduces a two-phase training strategy for fairness-aware GNNs.
  • Enhances the CAF framework by incorporating graph editing and new loss functions.
  • Demonstrates improved performance in classification accuracy and fairness metrics.
  • Addresses topology bias by manipulating homophily ratios in the graph.
Read more
MUXQ: Mixed-to-Uniform Precision MatriX Quantization via Low-Rank Outlier Decomposition
Seoungsub Lee, In Seo Kim, Seon Wook Kim
NLP Large Language Models Efficient ML
  • MUXQ introduces an auxiliary matrix to mitigate activation outliers in quantization.
  • The method enables uniform low-precision INT8 quantization while preserving accuracy.
  • Experiments show improved performance over existing quantization techniques like LLM.int8() and SmoothQuant.
  • MUXQ is designed to be hardware-friendly, particularly for neural processing units (NPUs).
Read more
Do We Need Frontier Models to Verify Mathematical Proofs?
Aaditya Naik, Guruprerana Shabadi, Rajeev Alur, Mayur Naik
NLP Large Language Models Theory
  • Smaller open-source models can achieve nearly the same accuracy as frontier models in verifying mathematical proofs.
  • Self-consistency is a significant challenge for smaller models, being up to 25% less consistent than frontier models.
  • Specialized prompt ensembles can significantly enhance the performance of smaller models in proof verification tasks.
  • The study provides insights into the capabilities of LLMs for mathematical reasoning and verification.
Read more
AdaHOP: Fast and Accurate Low-Precision Training via Outlier-Pattern-Aware Rotation
Seonggon Kim, Alireza Khodamoradi, Kristof Denolf, Eunhyeok Park
Large Language Models Efficient ML
  • First systematic analysis of outlier patterns in LLMs, identifying three types: Row-wise, Column-wise, and None.
  • Introduction of AdaHOP, which adapts Hadamard transforms based on the identified outlier patterns.
  • Achieves significant memory savings and acceleration in training while maintaining model quality.
  • Demonstrates the importance of tailored strategies for different tensor operations in low-precision training.
Read more
Can LLMs Learn to Reason Robustly under Noisy Supervision?
Shenzhi Yang, Guangcheng Zhu, Bowen Song, Sharon Li, Haobo Wang, Xing Zheng, Yingfan Ma, Zhongqi Chen, Weiqiang Wang, Gang Chen
Large Language Models Reinforcement Learning Theory
  • Introduces a systematic analysis of noisy label mechanisms in RLVR.
  • Distinguishes between inactive and active noisy labels and their impacts on training.
  • Proposes Online Label Refinement (OLR) to correct noisy labels dynamically.
  • Demonstrates the Early Correctness Coherence phenomenon during training.
Read more
Collapse-Free Prototype Readout Layer for Transformer Encoders
Giansalvo Cirrincione, Rahul Ranjeev Kumar
Theory Efficient ML NLP
  • Introduction of DDCL-Attention, a prototype-based readout layer for transformers.
  • Mathematical guarantees against prototype collapse and formal training stability.
  • Versatile application in multiple paradigms, including readout layers and hierarchical compression.
  • Empirical validation showing effective prototype separation and high codebook utilization.
Read more
Multirate Stein Variational Gradient Descent for Efficient Bayesian Sampling
Arash Sarshar
Efficient ML Optimization Theory
  • Introduces multirate SVGD to separately handle drift and repulsion in Bayesian sampling.
  • Develops adaptive error-controlled multirate methods for improved stability and efficiency.
  • Demonstrates significant performance improvements over traditional SVGD in complex posterior scenarios.
  • Provides a comprehensive benchmark suite for evaluating the proposed methods.
Read more
Forgetting to Witness: Efficient Federated Unlearning and Its Visible Evaluation
Houzhe Wang, Xiaojie Zhu, Chi Chen
Federated Learning Generative Models Efficient ML
  • Introduction of the first complete pipeline for federated unlearning.
  • Development of an efficient unlearning approach that does not require historical data.
  • Creation of the Skyeye framework for visualizing the forgetting capacity of unlearning models.
  • Utilization of knowledge distillation to facilitate the unlearning process.
Read more
The Geometric Alignment Tax: Tokenization vs. Continuous Geometry in Scientific Foundation Models
Prashant C. Raju
Theory
  • Introduction of the Geometric Alignment Tax, highlighting the cost of discretizing continuous manifolds.
  • Controlled experiments show that continuous objectives significantly outperform discrete tokenization in geometric stability.
  • Identification of three failure regimes in biological foundation models, revealing systematic issues in representation.
  • Demonstration that finer quantization in learned codebooks can worsen geometric stability despite better reconstruction.
Read more
Physical Sensitivity Kernels Can Emerge in Data-Driven Forward Models: Evidence From Surface-Wave Dispersion
Ziye Yu, Yuqi Cai, Xin Liu
Theory Interpretability Optimization
  • Neural network surrogates can recover depth-dependent structures of surface-wave sensitivity kernels.
  • Learned sensitivities are influenced by both wave physics and the training distribution.
  • Surrogate gradients and Fisher information can effectively capture local inverse-problem geometry for inversion.
  • Emergent differential physics allows data-driven models to recover physical structures from observable data.
Read more
Towards Realistic Class-Incremental Learning with Free-Flow Increments
Zhiming Xu, Baile Xu, Jian Zhao, Furao Shen, Suorong Yang
Theory Optimization
  • Introduction of Free-Flow Class-Incremental Learning (FFCIL) to address realistic class arrival scenarios.
  • Development of a model-agnostic framework that stabilizes learning through a class-wise mean objective.
  • Method-wise adaptations to enhance robustness, including replay-constrained distillation and loss scale normalization.
  • Extensive experiments reveal significant performance drops in existing CIL methods under FFCIL, while the proposed approach shows consistent improvements.
Read more
The limits of bio-molecular modeling with large language models : a cross-scale evaluation
Yaxin Xu, Yue Zhou, Tianyu Zhao, Fengwei An, Zhixiang Ren
Large Language Models
  • Introduction of BioMol-LLM-Bench for evaluating LLMs in bio-molecular tasks.
  • Limited benefits of chain-of-thought data in biological modeling.
  • Hybrid mamba-attention architectures outperform traditional transformers for long sequences.
  • Supervised fine-tuning improves specialization but reduces generalization.
Read more
Restless Bandits with Individual Penalty Constraints: A New Near-Optimal Index Policy and How to Learn It
Nida Zamir, I-Hong Hou
Reinforcement Learning Optimization Theory
  • Introduction of a new RMAB framework that incorporates individual penalty constraints for users.
  • Development of the Penalty-Optimal Whittle (POW) index policy, which is asymptotically optimal and computationally tractable.
  • Proposal of the DeepPOW algorithm for online learning of the POW index without prior knowledge of user dynamics.
  • Comprehensive simulations validate the effectiveness of the POW index policy and DeepPOW in various network scheduling scenarios.
Read more
Evaluation of Bagging Predictors with Kernel Density Estimation and Bagging Score
Philipp Seitz, Jan Schmitt, Andreas Schiffler
Theory Efficient ML
  • Introduces a new method for evaluating bagging predictors using Kernel Density Estimation.
  • Presents the Bagging Score as a confidence metric for ensemble predictions.
  • Demonstrates improved prediction accuracy over traditional mean or median methods.
  • Ranks highly against existing nonlinear regression approaches without optimization.
Read more
Learning from Equivalence Queries, Revisited
Mark Braverman, Roi Livni, Yishay Mansour, Shay Moran, Kobbi Nissim
Theory Efficient ML Interpretability
  • Introduces symmetric counterexample generators to reduce adversarial behavior in learning from equivalence queries.
  • Establishes tight bounds on learning rounds under both full-information and bandit feedback settings.
  • Combines game-theoretic perspectives with adaptive weighting algorithms for improved learning efficiency.
  • Retains the requirement for proper hypothesis proposals to ensure computational efficiency and interpretability.
Read more
Anticipatory Reinforcement Learning: From Generative Path-Laws to Distributional Value Functions
Daniel Bloch
Reinforcement Learning Theory Time Series
  • Introduces Anticipatory Reinforcement Learning (ARL) framework for non-Markovian environments.
  • Utilizes a signature-augmented manifold for dynamic path-law representation.
  • Enables 'Single-Pass' policy evaluation, reducing computational complexity.
  • Develops a generative engine based on Neural Controlled Differential Equations (CDEs).
Read more
Gradient Boosting within a Single Attention Layer
Saleh Sargolzaei
NLP Large Language Models Theory
  • Introduction of gradient-boosted attention, a multi-round attention mechanism.
  • Demonstrates a formal correspondence to gradient boosting under a squared reconstruction objective.
  • Shows that separate projections for correction can recover residual information lost in standard attention.
  • Achieves significant improvements in test perplexity over standard and alternative attention mechanisms.
Read more
Algebraic Diversity: Group-Theoretic Spectral Estimation from Single Observations
Mitchell A. Thornton
Theory Efficient ML Audio & Speech
  • Introduces algebraic group actions as a method for spectral estimation from single observations.
  • Establishes a General Replacement Theorem for consistent estimation of subspace decomposition.
  • Demonstrates the optimality of the symmetric group for achieving superior spectral decomposition.
  • Applies the framework to various domains, achieving significant performance improvements over traditional methods.
Read more
Beyond Imbalance Ratio: Data Characteristics as Critical Moderators of Oversampling Method Selection
Yuwen Jiang, Songyun Ye
Theory
  • IR alone is a weak predictor of oversampling effectiveness; class separability is a stronger moderator.
  • The study provides a new framework for method selection that considers multiple data characteristics.
  • Controlled experiments reveal negative correlations between IR and oversampling benefits, challenging previous literature.
  • The findings highlight the need for evidence-based guidelines in selecting oversampling techniques.
Read more
BWTA: Accurate and Efficient Binarized Transformer by Algorithm-Hardware Co-design
Yifu Ding, Xianglong Liu, Shenghao Jin, Jinyang Guo, Jiwen Lu
NLP Large Language Models Efficient ML
  • Introduces Binary Weights and Ternary Activations (BWTA) for improved quantization in Transformers.
  • Develops a Smooth Multi-Stage Quantization framework for stable training and convergence.
  • Creates a custom BWTA MatMul CUDA kernel for efficient GPU execution.
  • Achieves near full-precision performance for BERT with minimal accuracy drop.
Read more
Jump Start or False Start? A Theoretical and Empirical Evaluation of LLM-initialized Bandits
Adam Bayley, Xiaodan Zhu, Raquel Aoki, Yanshuai Cao, Kevin H. Wilson
Theory Large Language Models Reinforcement Learning
  • Introduces a Noisy-CBLI framework to evaluate the impact of noise in LLM-generated preference data.
  • Empirical results show that warm-starting benefits diminish and can become harmful beyond 30% noise.
  • Systematic misalignment of LLM preferences can lead to higher regret than cold-start bandits.
  • Develops a theoretical analysis linking prior-error to performance outcomes in bandit algorithms.
Read more
Causal-Audit: A Framework for Risk Assessment of Assumption Violations in Time-Series Causal Discovery
Marco Ruiz, Miguel Arana-Catania, David R. Ardila, Rodrigo Ventura
Time Series
  • Causal-Audit formalizes assumption validation as calibrated risk assessment.
  • The framework computes risk scores across five assumption families and provides uncertainty intervals.
  • An abstention-aware decision policy is implemented to guide method selection based on risk scores.
  • Evaluation shows high calibration accuracy (AUROC > 0.95) and significant false positive reduction.
Read more
A Clinical Point Cloud Paradigm for In-Hospital Mortality Prediction from Multi-Level Incomplete Multimodal EHRs
Bohao Li, Tao Zou, Junchen Ye, Yan Gong, Bowen Du
Multimodal
  • Introduces HealthPoint (HP) for modeling multi-level incomplete EHRs.
  • Utilizes a 4D coordinate system to represent clinical events as points.
  • Employs Low-Rank Relational Attention for capturing high-order dependencies.
  • Demonstrates state-of-the-art performance in mortality prediction.
Read more
The Role of Generator Access in Autoregressive Post-Training
Amit Kiran Rege
NLP Large Language Models Theory
  • Generator access significantly influences the effectiveness of autoregressive post-training.
  • Prefix control is a primary boundary that affects learning outcomes.
  • Weak local reset can eliminate barriers to reaching informative prefixes.
  • Observation richness becomes meaningful only after prefix control is granted.
Read more
WGFINNs: Weak formulation-based GENERIC formalism informed neural networks
Jun Sur Richard Park, Auroni Huque Hashim, Siu Wun Cheung, Youngsoo Choi, Yeonjong Shin
Theory Interpretability
  • WGFINNs enhance robustness to noisy data compared to GFINNs.
  • The weak formulation approach allows for accurate modeling even in the presence of noise.
  • Incorporation of state-wise weighted loss and residual-based attention improves performance.
  • Theoretical analysis supports the effectiveness of weak formulations in maintaining trajectory consistency.
Read more
Understanding Latent Diffusability via Fisher Geometry
Jing Gu, Morteza Mardani, Wonjun Lee, Dongmian Zou, Gilad Lerman
Generative Models Theory Efficient ML
  • Introduces a theoretical framework linking latent diffusability with Fisher Information Geometry.
  • Identifies and quantifies geometric distortions affecting latent diffusion performance.
  • Derives conditions for preserving Fisher Information Rate (FIR) to ensure stable diffusion.
  • Validates the framework through experiments, showing that standard VAEs often exhibit significant FIR deviations.
Read more
Delayed Homomorphic Reinforcement Learning for Environments with Delayed Feedback
Jongsoo Lee, Jangwon Kim, Soohee Han
Reinforcement Learning Robotics Theory
  • Introduces Delayed Homomorphic Reinforcement Learning (DHRL) to address delayed feedback in RL.
  • Utilizes MDP homomorphisms to create a compact abstract MDP, improving sample efficiency.
  • Presents two algorithms: DHVI for finite domains and D2HPG for continuous domains.
  • Demonstrates superior performance of DHRL over traditional augmentation-based methods in long-delay environments.
Read more
Empowering Power Outage Prediction with Spatially Aware Hybrid Graph Neural Networks and Contrastive Learning
Xuyang Shen, Zijie Pan, Diego Cerrai, Xinxuan Zhang, Christopher Colorio, Emmanouil N. Anagnostou, Dongjin Song
Graph Learning
  • Introduction of SA-HGNN, a hybrid model integrating static and dynamic spatial dependencies for power outage prediction.
  • Development of a dynamic graph learning module to capture complex spatial relationships across different weather events.
  • Use of contrastive learning to generate location-specific embeddings, addressing the imbalance in outage datasets.
  • Empirical studies show SA-HGNN outperforms existing models in four utility service territories.
Read more
SIEVE: Sample-Efficient Parametric Learning from Natural Language
Parth Asawa, Alexandros G. Dimakis, Matei Zaharia
NLP Large Language Models Efficient ML
  • SIEVE enables sample-efficient parametric learning from natural language context using as few as three examples.
  • The method utilizes a synthetic data generation pipeline, SIEVE-GEN, which decomposes context to create high-quality training data.
  • Empirical results show that SIEVE outperforms prior context distillation methods and can match or exceed ICL performance without context at inference time.
  • The approach allows for persistent improvements in model performance with minimal input, making parametric learning more practical.
Read more
Align Your Structures: Generating Trajectories with Structure Pretraining for Molecular Dynamics
Aniketh Iyengar, Jiaqi Han, Pengwei Sun, Mingjian Jiang, Jianwen Xie, Stefano Ermon
Generative Models
  • Introduces a two-stage framework for generating MD trajectories that combines structure pretraining and temporal interpolation.
  • Addresses data scarcity in MD simulations by leveraging large-scale conformer datasets.
  • Implements an equivariant temporal interpolator to model temporal dependencies in molecular dynamics.
  • Demonstrates improved accuracy in generating chemically realistic MD trajectories across various molecular systems.
Read more
Where to Steer: Input-Dependent Layer Selection for Steering Improves LLM Alignment
Soham Gadgil, Chris Lin, Su-In Lee
NLP Large Language Models Theory
  • The assumption of a globally fixed intervention layer for steering vectors is fundamentally limited.
  • Different inputs may require steering at different layers to align with target behaviors.
  • The W2S framework learns to predict the optimal steering layer based on input embeddings.
  • W2S consistently outperforms traditional fixed-layer steering methods across various datasets.
Read more
Supervised Dimensionality Reduction Revisited: Why LDA on Frozen CNN Features Deserves a Second Look
Indar Kumar, Girish Karhana, Sai Krishna Jasti, Ankit Hemant Lade
Computer Vision Efficient ML Theory
  • LDA consistently improves classification accuracy over full-dimensional features across multiple architectures and datasets.
  • Dimensionality reduction using LDA can reduce feature dimensionality by 61-95% while enhancing accuracy.
  • LDA outperforms PCA and more complex alternatives in both accuracy and computational cost.
  • Two lightweight extensions to LDA are introduced, offering slight accuracy improvements.
Read more
Generative Frontiers: Why Evaluation Matters for Diffusion Language Models
Patrick Pynadath, Jiaxin Shi, Ruqi Zhang
NLP Large Language Models Generative Models
  • Diffusion language models offer greater flexibility in generative tasks than autoregressive models.
  • Current evaluation methodologies, particularly likelihood-based metrics, are inadequate for assessing dLLMs.
  • Generative perplexity and entropy can be decomposed into KL divergence components, leading to a new evaluation framework.
  • OpenWebText is recommended as the standard pretraining dataset for dLLMs over LM1B.
Read more
ArrowFlow: Hierarchical Machine Learning in the Space of Permutations
Ozgur Yilmaz
Theory Efficient ML
  • Introduces ArrowFlow, a permutation-based machine learning architecture.
  • Utilizes ranking filters and hierarchical learning without floating-point parameters.
  • Demonstrates robustness to monotone batch effects in gene expression data.
  • Achieves competitive accuracy across various datasets, including UCI and MNIST.
Read more
Simple yet Effective: Low-Rank Spatial Attention for Neural Operators
Zherui Yang, Haiyang Xin, Tao Du, Ligang Liu
Theory Efficient ML
  • Introduction of Low-Rank Spatial Attention (LRSA) for neural operators.
  • Unification of global mixing modules under a low-rank perspective.
  • Use of standard Transformer primitives for simplicity and efficiency.
  • Achieved over 17% error reduction compared to existing methods.
Read more
Autoencoder-Based Parameter Estimation for Superposed Multi-Component Damped Sinusoidal Signals
Momoka Iida, Hayato Motohashi, Hirotaka Takahashi
Time Series
  • Development of an autoencoder-based method for parameter estimation of damped sinusoidal signals.
  • Evaluation of the method's performance under Gaussian and uniform training data distributions.
  • High accuracy in parameter estimation even in complex scenarios with overlapping components.
  • Robustness of the method against noise and less informative training distributions.
Read more
Investigating Data Interventions for Subgroup Fairness: An ICU Case Study
Erin Tan, Judy Hanwen Shen, Irene Y. Chen
Theory
  • Algorithmic bias in healthcare can exacerbate disparities among subgroups.
  • Combining data sources may lead to unpredictable effects on model fairness and performance.
  • Common data addition strategies are often ineffective and can introduce distribution shifts.
  • A hybrid approach of data-centric methods and model calibration is most effective for improving subgroup performance.
Read more
Complex-Valued GNNs for Distributed Basis-Invariant Control of Planar Systems
Samuel Honor, Mohamed Abdelnaby, Kevin Leahy
Graph Learning Robotics Theory
  • Introduces a complex-valued GNN architecture for distributed control of planar systems.
  • Achieves global invariance to local basis choices, enhancing applicability in GPS-denied environments.
  • Utilizes complex-valued linear layers and phase-equivariant activation functions.
  • Demonstrates improved data efficiency and tracking performance over traditional real-valued GNNs.
Read more
Cog-DRIFT: Exploration on Adaptively Reformulated Instances Enables Learning from Hard Reasoning Problems
Justin Chih-Yao Chen, Archiki Prasad, Zaid Khan, Joykirat Singh, Runchu Tian, Elias Stengel-Eskin, Mohit Bansal
NLP Large Language Models Reinforcement Learning
  • Cog-DRIFT reformulates difficult reasoning problems into simpler task formats to facilitate learning.
  • The framework utilizes an adaptive curriculum that progresses from easier to harder tasks based on model performance.
  • Significant performance improvements were observed on challenging reasoning benchmarks, with gains of +10.11% for Qwen and +8.64% for Llama.
  • The method shows strong generalization capabilities across held-out datasets.
Read more
Integrating Artificial Intelligence, Physics, and Internet of Things: A Framework for Cultural Heritage Conservation
Carmine Valentino, Federico Pichi, Francesco Colace, Dajana Conte, Gianluigi Rozza
Theory
  • Integration of AI, IoT, and physical knowledge for cultural heritage conservation.
  • Development of a four-layer framework for effective monitoring and predictive maintenance.
  • Utilization of Physics-Informed Neural Networks (PINNs) for enhanced simulation accuracy.
  • Incorporation of Reduced Order Methods (ROMs) to improve computational efficiency.
Read more
Hierarchical Planning with Latent World Models
Wancong Zhang, Basile Terver, Artem Zholus, Soham Chitnis, Harsh Sutaria, Mido Assran, Randall Balestriero, Amir Bar, Adrien Bardes, Yann LeCun, Nicolas Ballas
Reinforcement Learning Robotics Optimization
  • Introduces a hierarchical planning framework that mitigates prediction errors in long-horizon control tasks.
  • Achieves a 70% success rate in real-world robotic tasks using zero-shot control with final goal specifications.
  • Demonstrates up to 3Γ— reduction in planning-time compute compared to traditional methods.
  • Applies across diverse latent world-model architectures, enhancing generalizability.
Read more
Fast NF4 Dequantization Kernels for Large Language Model Inference
Xiangbo Qi, Chaoyi Jiang, Murali Annavaram
Large Language Models Efficient ML Optimization
  • Identification of dequantization as a critical bottleneck in LLM inference.
  • Development of a lightweight shared memory optimization that reduces latency.
  • Achieved 2.0–2.2Γ— speedup in kernel execution and 1.54Γ— end-to-end improvement.
  • Compatibility with HuggingFace ecosystem facilitates easy adoption.
Read more
How Long short-term memory artificial neural network, synthetic data, and fine-tuning improve the classification of raw EEG data
Albert Nasybullin, Vladimir Maksimenko, Semen Kurkin
Time Series
  • Integration of synthetic data generation with LSTM networks enhances EEG classification.
  • The study addresses the limitations of traditional machine learning methods in EEG data analysis.
  • Fine-tuning techniques are crucial for improving model performance on ambiguous visual stimuli.
  • The experimental design involved a well-defined dataset with controlled ambiguity levels.
Read more
Diagonal-Tiled Mixed-Precision Attention for Efficient Low-Bit MXFP Inference
Yifu Ding, Xinhao Zhang, Jinyang Guo
Large Language Models Efficient ML
  • Introduction of Diagonal-Tiled Mixed-Precision Attention (DMA) for efficient LLM inference.
  • Development of a fully fused GPU kernel that integrates multiple processes into one workflow.
  • Demonstration of lossless generation quality compared to traditional full-precision methods.
  • Significant speedup achieved through kernel fusion and efficient memory usage.
Read more
Communication-free Sampling and 4D Hybrid Parallelism for Scalable Mini-batch GNN Training
Cunyang Wei, Siddharth Singh, Aishwarya Sarkar, Daniel Nichols, Tisha Patel, Aditya K. Ranjan, Sayan Ghosh, Ali Jannesari, Nathan R. Tallent, Abhinav Bhatele
Graph Learning
  • Introduction of ScaleGNN, a 4D parallel framework for mini-batch GNN training.
  • Communication-free distributed sampling algorithm that enhances efficiency.
  • 3D parallel matrix multiplication (PMM) for improved scalability.
  • Achieved significant speedup and maintained high accuracy on large datasets.
Read more
SODA: Semi On-Policy Black-Box Distillation for Large Language Models
Xiwen Chen, Jingjing Wang, Wenhui Zhu, Peijie Qiu, Xuanzhao Dong, Hejian Sang, Zhipeng Wang, Alborz Geramifard, Feng Luo
NLP Large Language Models Efficient ML
  • Introduction of semi on-policy distillation, utilizing static snapshots for effective training.
  • SODA achieves up to 10Γ— faster training and 27% less peak GPU memory usage compared to GAD.
  • Outperforms state-of-the-art methods on 15 out of 16 benchmark results.
  • Eliminates the need for adversarial training and additional models, simplifying the distillation process.
Read more
Reflective Context Learning: Studying the Optimization Primitives of Context Space
Nikita Vassilyev, William Berrios, Ruowang Zhang, Bo Han, Douwe Kiela, Shikib Mehri
Optimization Reinforcement Learning Theory
  • Introduction of Reflective Context Learning (RCL) as a unified framework for context optimization.
  • Emphasis on reflection as a mechanism to generate update signals for context space learning.
  • Integration of classical optimization techniques to enhance learning in context space.
  • Demonstrated improvements over strong baselines across multiple benchmarks.
Read more
Conditional Sampling via Wasserstein Autoencoders and Triangular Transport
Mohammad Al-Jarrah, Michele Martino, Marcus Yim, Bamdad Hosseini, Amirhossein Taghvaei
Generative Models Theory Efficient ML
  • Introduction of Conditional Wasserstein Autoencoders (CWAEs) for conditional sampling.
  • Utilization of block-triangular decoders to exploit low-dimensional structures.
  • Demonstration of substantial error reductions compared to traditional methods like LREnKF.
  • Theoretical connections to conditional optimal transport problems.
Read more
Entropy, Disagreement, and the Limits of Foundation Models in Genomics
Maxime Rochkoulets, Lovro Vrček, Mile Ε ikiΔ‡
Theory
  • High entropy in genomic sequences leads to uncertainty in model predictions.
  • Models trained on DNA exhibit significant disagreement and instability compared to text models.
  • Fisher information is concentrated in embedding layers of genomic models, limiting inter-token relationship exploitation.
  • Self-supervised training from sequences alone may not be effective for genomic data.
Read more
Which Leakage Types Matter?
Simon Roth
Theory
  • Estimation leakage has minimal impact on model performance, with Ξ”AUC ≀ 0.005.
  • Selection leakage significantly inflates performance metrics, with effects ranging from +0.013 to +0.045.
  • Memorization leakage produces the largest raw effects, varying by model capacity.
  • Boundary leakage is often undetected in standard cross-validation, masking its impact.
Read more
ROMAN: A Multiscale Routing Operator for Convolutional Time Series Models
Gonzalo Uribarri
Time Series
  • ROMAN is a deterministic operator that enhances time series representation by creating a structured channel format.
  • It allows for better control over inductive biases in convolutional models, improving temporal awareness and multiscale interactions.
  • The operator was evaluated through synthetic tasks and real datasets, showing task-dependent improvements in accuracy and efficiency.
  • ROMAN can be integrated with existing convolutional architectures without replacing them, serving as a preprocessing step.
Read more
Re-analysis of the Human Transcription Factor Atlas Recovers TF-Specific Signatures from Pooled Single-Cell Screens with Missing Controls
Arka Jain, Umesh Sharma
Theory
  • Successfully re-analyzed the Human TF Atlas dataset to recover TF-specific signatures despite missing controls.
  • Assigned TF identities to 60,997 cells, significantly improving the detection of transcriptional effects.
  • Identified strong transcriptional remodelers and linked them to specific biological pathways.
  • Demonstrated the importance of using external controls for accurate differential expression analysis.
Read more
Isokinetic Flow Matching for Pathwise Straightening of Generative Flows
Tauhid Khan
Generative Models Efficient ML Optimization
  • Introduction of Isokinetic Flow Matching (Iso-FM) as a solution to curvature issues in flow-based generative models.
  • Iso-FM utilizes a Jacobian-free regularizer to penalize pathwise acceleration, enhancing local velocity consistency.
  • Demonstrated significant improvements in few-step sampling efficiency on CIFAR-10, with a 2.9Γ— relative efficiency gain.
  • Empirical results show a reduction in conditional non-OT FID@2 from 78.82 to 27.13.
Read more
Adaptive Semantic Communication for Wireless Image Transmission Leveraging Mixture-of-Experts Mechanism
Haowen Wan, Qianqian Yang
Computer Vision Efficient ML Multimodal
  • Introduces a novel multi-stage end-to-end image semantic communication system.
  • Utilizes a dynamic expert gating mechanism for adaptive routing based on CSI and image semantics.
  • Achieves significant improvements in image reconstruction quality over existing methods.
  • Maintains high transmission efficiency despite increased model adaptability.
Read more
Generalization Limits of Reinforcement Learning Alignment
Haruhi Shida, Koo Imai, Keigo Kansa
NLP Large Language Models Reinforcement Learning
  • RLHF does not acquire new capabilities but redistributes existing ones, limiting generalization to unknown attacks.
  • The introduction of 'compound jailbreaks' effectively demonstrates vulnerabilities in LLM safety mechanisms.
  • The attack success rate significantly increases when combining multiple attack techniques, indicating weaknesses in individual defenses.
  • Safety training may overfit to training data, failing to generalize to diverse attack patterns.
Read more
DrugPlayGround: Benchmarking Large Language Models and Embeddings for Drug Discovery
Tianyu Liu, Sihan Jiang, Fan Zhang, Kunyang Sun, Teresa Head-Gordon, Hongyu Zhao
NLP Large Language Models Multimodal
  • DrugPlayGround is a comprehensive benchmarking platform for evaluating LLMs in drug discovery.
  • The framework assesses LLM performance across four critical drug discovery tasks.
  • Collaboration with domain experts is emphasized to enhance the reliability of LLM predictions.
  • The study provides insights into the strengths and limitations of LLMs in pharmaceutical applications.
Read more
Time-Warping Recurrent Neural Networks for Transfer Learning
Jonathon Hirschi
Time Series
  • Introduces a time-warping approach for transfer learning in RNNs.
  • Demonstrates the ability of LSTMs to approximate time lag models with high accuracy.
  • Applies the method to predict fuel moisture content, relevant for wildfire modeling.
  • Achieves competitive accuracy compared to traditional transfer learning methods with fewer parameter adjustments.
Read more
SLaB: Sparse-Lowrank-Binary Decomposition for Efficient Large Language Models
Ziwei Li, Yuang Ma, Yi Kang
Large Language Models Efficient ML Optimization
  • SLaB combines sparse, low-rank, and binary matrix decomposition for efficient LLM compression.
  • The framework eliminates the need for retraining, making it computationally efficient.
  • SLaB achieves a 36% reduction in perplexity and an 8.98% accuracy improvement on zero-shot tasks.
  • Activation-aware pruning scores are utilized to guide the decomposition process.
Read more
Neural Operators for Multi-Task Control and Adaptation
David Sewell, Xingjian Li, Stepan Tretiakov, Krishna Kumar, David Fridovich-Keil
Reinforcement Learning Robotics Optimization
  • Neural operators are established as effective models for multi-task control, capable of approximating mappings from task-defining functions to optimal policies.
  • The architecture allows for structured adaptation strategies, enabling efficient updates and fine-tuning for new tasks.
  • Meta-trained variants of the operator facilitate rapid few-shot adaptation, improving performance with limited data.
  • The approach generalizes well across different task distributions and varying amounts of training data.
Read more
AXELRAM: Quantize Once, Never Dequantize
Yasushi Nishida
Large Language Models Efficient ML Optimization
  • AXELRAM enables computation of attention scores directly from quantized KV cache indices, eliminating the need for dequantization.
  • The architecture achieves a 102.4Γ— reduction in multiplications required for attention score computation.
  • Sign pattern sensitivity in KV cache quantization can lead to significant performance degradation in certain models.
  • A gradient-free sign pattern selection method is proposed to address catastrophic spikes in perplexity.
Read more
MAVEN: A Mesh-Aware Volumetric Encoding Network for Simulating 3D Flexible Deformation
Zhe Feng, Shilong Tao, Haonan Sun, Shaohan Chen, Zhanxing Zhu, Yunhuai Liu
Graph Learning
  • MAVEN introduces a mesh-aware approach that incorporates higher-dimensional geometric features for improved simulation accuracy.
  • The model enhances the representation of contact interactions and internal physical propagation by utilizing 3D cells and 2D facets.
  • MAVEN consistently outperforms existing GNN-based methods on established datasets and specific tasks involving complex deformations.
  • The architecture employs learnable position-aware aggregators to facilitate information propagation through higher-dimensional structures.
Read more
k-Maximum Inner Product Attention for Graph Transformers and the Expressive Power of GraphGPS
Jonas De Schouwer, Haitz SΓ‘ez de OcΓ‘riz Borde, Xiaowen Dong
Graph Learning Efficient ML Theory
  • Introduction of k-Maximum Inner Product (k-MIP) attention for graph transformers.
  • Achieves linear memory complexity and significant speed improvements over traditional attention mechanisms.
  • Proves that k-MIP attention retains the expressive power of full-attention transformers.
  • Integrates seamlessly into the GraphGPS framework with established theoretical bounds.
Read more
Dynamic Free-Rider Detection in Federated Learning via Simulated Attack Patterns
Motoki Nakamura
Federated Learning
  • Introduces S2-WEF, a novel method for detecting dynamic free-riders in Federated Learning.
  • Demonstrates the limitations of existing WEF-defense methods against dynamic free-rider attacks.
  • Combines simulation-based similarity scores with mutual deviation scores for improved detection accuracy.
  • Validates the effectiveness of S2-WEF through extensive experiments on multiple datasets and attack types.
Read more
Choosing the Right Regularizer for Applied ML: Simulation Benchmarks of Popular Scikit-learn Regularization Frameworks
Benjamin S. Knight, Ahsaas Bajaj
Theory Optimization Efficient ML
  • Ridge, Lasso, and ElasticNet are nearly interchangeable for prediction accuracy with sufficient sample-to-feature ratios.
  • Lasso's recall is fragile under multicollinearity, with significant performance degradation in challenging conditions.
  • ElasticNet outperforms Lasso in high multicollinearity scenarios, maintaining higher recall rates.
  • The paper provides a decision guide for selecting appropriate regularization frameworks based on feature space attributes.
Read more