AI-generated summaries

Today's ML research,
without the noise.

Daily summaries of the latest machine learning papers from arXiv, processed every 8 hours.

48 Papers today
8h Update frequency
7 Days of history
Superposition Is Not Necessary: A Mechanistic Interpretability Analysis of Transformer Representations for Time Series Forecasting
Alper Yıldırım
Time Series
  • A single-layer transformer can match the performance of deeper models in time series forecasting.
  • Expanding the dictionary size in sparse autoencoders yields minimal impact on forecasting performance.
  • Transformers do not rely on superposition for effective representation in time series tasks.
  • The findings suggest that the complexity of transformers may not be justified for time series forecasting.
Read more
Towards Metric-Faithful Neural Graph Matching
Jyotirmaya Shivottam, Subhankar Mishra
Graph Learning Theory Optimization
  • Introduces a geometric framework linking encoder geometry to GED estimation quality.
  • Demonstrates that bi-Lipschitz encoders improve stability and accuracy in GED surrogates.
  • Establishes a theoretical basis for the impact of encoder distortion on downstream estimators.
  • Empirical results show significant performance improvements using geometry-aware variants.
Read more
A Robust Foundation Model for Conservation Laws: Injecting Context into Flux Neural Operators via Recurrent Vision Transformers
Taeyoung Kim, Joon-Hyuk Ko
Theory Efficient ML Time Series
  • Introduces a context-conditioned Flux Neural Operator that leverages Recurrent Vision Transformers for enhanced performance in solving conservation laws.
  • The model infers latent numerical flux operators from short observed trajectories, allowing for adaptability without explicit PDE knowledge.
  • Demonstrates improved autoregressive stability and robustness compared to traditional PDE foundation models on benchmark problems.
  • Preserves the conservative structure of numerical updates, crucial for accurate long-time predictions in nonlinear hyperbolic problems.
Read more
Regime-Conditioned Evaluation in Multi-Context Bayesian Optimization
Noel Thomas
Optimization Theory
  • Existing benchmarks in Bayesian Optimization often fail to account for regime variables, leading to unreliable performance rankings.
  • The Portable Regime Score (PRS) is introduced as a method to quantify and predict the impact of regime variables on algorithm performance.
  • The REGIMEPLANNER framework demonstrates the practical application of PRS, outperforming traditional acquisition strategies in various benchmarks.
  • A significant portion of the literature does not vary critical parameters, which skews the reported effectiveness of algorithms.
Read more
Low-Cost Black-Box Detection of LLM Hallucinations via Dynamical System Prediction
Dan Wilson, Mohamed Akrout
NLP Large Language Models Theory
  • Introduces a novel method for hallucination detection in LLMs by treating them as dynamical systems.
  • Utilizes a differential error score to distinguish between factual and hallucinated responses in a single pass.
  • Achieves state-of-the-art performance across multiple benchmarks with reduced resource requirements.
  • Incorporates a calibration mechanism for user-specific detection preferences.
Read more
From Video-to-PDE: Data-Driven Discovery of Nonlinear Dye Plume Dynamics
Cesar Acosta-Minoli, Sayantan Sarkar
Computer Vision Theory Interpretability
  • Development of a video-to-PDE pipeline for extracting models from dye-plume dynamics.
  • Utilization of weak-form regression to mitigate issues with noisy video data.
  • Introduction of a robust model selection protocol based on forward-rollout performance.
  • The selected PDE model outperforms traditional advection-diffusion models.
Read more
Is Escalation Worth It? A Decision-Theoretic Characterization of LLM Cascades
Dylan Bouchard
Large Language Models Optimization Efficient ML
  • Developed a decision-theoretic framework for analyzing LLM cascades, linking cost and quality through optimization.
  • Characterized the cost-quality frontier as a pointwise envelope over pairwise cascades, significantly reducing costs.
  • Established first-order conditions that account for model confidence scores and their impact on expected quality.
  • Demonstrated that a pre-generation router can outperform traditional cascade policies, highlighting structural cost advantages.
Read more
Robustness of Graph Self-Supervised Learning to Real-World Noise: A Case Study on Text-Driven Biomedical Graphs
Othmane Kabal, Mounira Harzallah, Fabrice Guillet, Hideaki Takeda, Ryutaro Ichise
Graph Learning
  • Introduction of NATD-GSSL framework for robust GSSL on noisy graphs.
  • Development of a dual-graph evaluation protocol for assessing GSSL performance.
  • Empirical analysis reveals variability in robustness across GSSL methods and GNN architectures.
  • Bidirectional GNN architectures are more effective for noisy graphs compared to unidirectional ones.
Read more
A Self-Attentive Meta-Optimizer with Group-Adaptive Learning Rates and Weight Decay
JiangBo Zhao, ZhaoXin Liu
Optimization Time Series NLP
  • MetaAdamW integrates self-attention for dynamic modulation of learning rates and weight decay.
  • The optimizer uses a meta-learning objective to train the attention module effectively.
  • It extends homoscedastic uncertainty weighting with task-specific priorities for better loss balancing.
  • MetaAdamW outperforms standard AdamW across multiple tasks, improving performance and reducing training time.
Read more
Bandit Learning in General Open Multi-agent Systems
Mengfan Xu
Theory Optimization
  • Introduces a unified framework for bandit learning in open multi-agent systems, addressing limitations of existing models.
  • Defines new concepts such as pre-training degree and stability to capture the complexities of dynamic agent populations.
  • Develops certified global-UCB learning methodologies with provable regret bounds.
  • Demonstrates that regret is influenced by both the pre-training degree of new agents and the stability of the system.
Read more
The Predictive-Causal Gap: An Impossibility Theorem and Large-Scale Neural Evidence
Kejun Liu
Theory
  • The predictive-causal gap is a structural limit in predictive representation learning.
  • Optimal encoders often track environmental dynamics rather than system dynamics.
  • The gap intensifies with higher dimensionality, leading to significant misalignment.
  • Operational grounding can partially suppress the gap but does not fully recover causal fidelity.
Read more
Probabilistic Classification and Uncertainty Quantification of Sahara Desert Climate Using Feedforward Neural Networks
Stephen Tivenan, Indranil Sahoo, Yanjun Qian
Time Series
  • Introduction of a probabilistic framework for climate classification using feedforward neural networks.
  • Application of the model to the Sahara Desert, utilizing extensive climate data over a 30-year period.
  • Comparison of the ANN-based probabilistic classification with traditional K¨oppen-Trewartha classification.
  • Identification of significant fluctuations in climate probabilities, contributing to understanding desertification.
Read more
Designing a double deep reinforcement learning selection tool for resilient demand prediction
Bilel Abderrahmane Benziane, Benoit Lardeux, Ayoub Mcharek, Maher Jridi
Reinforcement Learning Time Series Optimization
  • Introduction of a double deep reinforcement learning architecture for dynamic forecasting model selection.
  • Development of an average reward-based early stopping technique to reduce training time.
  • Empirical evaluation against state-of-the-art methods using diverse datasets.
  • Demonstration of the proposed approach's robustness in varying data conditions.
Read more
How Many Iterations to Jailbreak? Dynamic Budget Allocation for Multi-Turn LLM Evaluation
Shai Feldman, Yaniv Romano
Large Language Models Optimization Theory
  • DAPRO is the first dynamic budget allocation framework for multi-turn LLM evaluations.
  • It provides distribution-free, finite-sample coverage guarantees without requiring conditional independence assumptions.
  • The framework yields tighter coverage bounds by scaling with the mean censoring weight.
  • Experiments show DAPRO outperforms static allocation methods in terms of coverage and variance.
Read more
From Drops to Grid: Noise-Aware Spatio-Temporal Neural Process for Rainfall Estimation
Rafael Pablos Sarabia, Joachim Nyborg, Morten Birk, Ira Assent
Time Series Multimodal
  • Introduces DropsToGrid, a Neural Process-based method for rainfall densification.
  • Combines temporal sequences from PWS with radar context for improved accuracy.
  • Utilizes multi-modal attention and translation-equivariant fusion for effective spatio-temporal reasoning.
  • Demonstrates superior performance over traditional and deep learning baselines in real-world evaluations.
Read more
Non-Myopic Active Feature Acquisition via Pathwise Policy Gradients
Linus Aronsson, Morteza Haghir Chehreghani
Reinforcement Learning Optimization Efficient ML
  • Introduces NM-PPG, a new method for non-myopic AFA using pathwise policy gradients.
  • Utilizes a continuous relaxation of the acquisition process to enable end-to-end optimization.
  • Implements a straight-through rollout scheme to improve alignment between training and deployment.
  • Stabilizes optimization with entropy regularization and staged temperature sharpening.
Read more
Investigating Trustworthiness of Nonparametric Deep Survival Models for Alzheimer's Disease Progression Analysis
Jacob Thrasher, Kaitlyn Heintzelman, Peter Martone, David Kotlowski, Binod Bhattarai, Donald Adjeroh, Prashnna Gyawali
Theory Interpretability Time Series
  • Introduction of two novel fairness metrics for nonparametric deep survival models.
  • Comprehensive evaluation pipeline for assessing model performance in AD progression.
  • Significant bias found in deep survival models concerning sensitive attributes.
  • Emphasis on the importance of fairness and interpretability in survival analysis.
Read more
Critical Windows of Complexity Control: When Transformers Decide to Reason or Memorize
Sarwan Ali
Theory
  • The memorization versus reasoning outcome in Transformers is determined within a critical training window.
  • Weight decay applied during a specific training phase can yield high OOD accuracy comparable to full training weight decay.
  • The timing of regularization is more important than its magnitude for achieving reasoning solutions.
  • The critical window's position is influenced by initialization scale, with smaller scales leading to a reduced basin of attraction for reasoning.
Read more
Transformed Latent Variable Multi-Output Gaussian Processes
Xiaoyu Jiang, Xinxing Shi, Sokratia Georgaka, Magnus Rattray, Mauricio A Álvarez
Theory Efficient ML Time Series
  • Introduction of T-LVMOGP, a scalable framework for MOGPs.
  • Utilization of a Lipschitz-regularized neural network for mapping inputs and latent variables.
  • Integration of stochastic variational inference for efficient training.
  • Demonstrated superior performance in predictive accuracy and efficiency over existing methods.
Read more
Normalized Architectures are Natively 4-Bit
Maxim Fishman, Brian Chmiel, Ron Banner, Daniel Soudry, Boris Ginsburg
Large Language Models Efficient ML Optimization
  • nGPT architecture is natively robust to 4-bit quantization, requiring no additional overhead fixes.
  • Robustness arises from coherent signal accumulation rather than noise suppression.
  • Training dynamics under the hypersphere constraint promote distributed alignments across dimensions.
  • Empirical validation shows lower relative error and stability across diverse model configurations.
Read more
SNAPO: Smooth Neural Adjoint Policy Optimization for Optimal Control via Differentiable Simulation
Dmitri Goloubentsev, Natalija Karpichina
Reinforcement Learning Optimization Theory
  • SNAPO integrates a neural policy within differentiable simulators for optimal control.
  • The framework allows for the computation of exact gradients in a single adjoint pass.
  • SNAPO demonstrates significant speedups in sensitivity analysis compared to traditional methods.
  • The approach is validated across three diverse domains with rapid training times.
Read more
Do Neural Operators Forget Geometry? The Forgetting Hypothesis in Deep Operator Learning
Yanming Xia, Angelica I. Aviles-Rivero
Theory
  • Introduction of the Geometric Forgetting Hypothesis, highlighting the loss of geometric information in deep operator architectures.
  • Demonstration of systematic geometric information decay through layer-wise probing in spectral and attention-based operators.
  • Identification of a structural limitation in transformer-based operators, termed the Geometric Shortcut, which leads to feature collapse under late geometry injection.
  • Proposal of a Geometry Memory Injection mechanism that effectively restores geometric information flow with minimal architectural changes.
Read more
Bridging Input Feature Spaces Towards Graph Foundation Models
Moshe Eliasof, Krishna Sri Ipsit Mantri, Beatrice Bevilacqua, Bruno Ribeiro, Carola-Bibiane Schönlieb
Graph Learning
  • Introduces ALL-IN, a method for transferring knowledge across graph datasets with varying input features.
  • Utilizes covariance-based statistics to create robust node representations independent of original feature spaces.
  • Demonstrates theoretical invariance properties of node-covariance operators to permutations and transformations.
  • Achieves strong empirical performance on diverse tasks without the need for architecture changes or retraining.
Read more
Mean Mode Screaming: Mean--Variance Split Residuals for 1000-Layer Diffusion Transformers
Pengqi Lu
Generative Models Multimodal Computer Vision
  • Characterization of Mean Mode Screaming (MMS) and its impact on training stability in ultra-deep DiTs.
  • Introduction of Mean–Variance Split (MV-Split) Residuals to address the mean-dominated collapse state.
  • Demonstration of MV-Split's effectiveness in preventing collapse and improving convergence rates compared to existing methods.
  • Successful training of a 1000-layer DiT, showcasing the architecture's scalability and stability.
Read more
MixINN: Accelerating Plant Breeding by Combining Mixed Models and Deep Learning for Interaction Prediction
Aike Potze, Fred van Eeuwijk, Ioannis N. Athanasiadis
Optimization
  • MixINN combines mixed models and deep learning for improved prediction of genotype-environment interactions.
  • The approach addresses the limitations of linear models by capturing nonlinear relationships in crop yield predictions.
  • Evaluation on a corn multi-environment trial dataset showed significant improvements in identifying high-yielding genotypes.
  • MixINN achieved a 5.8% increase in average yield for the top 20% of corn genotypes, with further improvements in specific environments.
Read more
Efficient Serving for Dynamic Agent Workflows with Prediction-based KV-Cache Management
Haoyu Zheng, Fangcheng Fu, Jia Wu, Binhang Yuan, Yongqiang Zhang, Hao Wang, Yuanyuan Zhu, Xiao Yan, Jiawei Jiang
Large Language Models Efficient ML
  • PBKV predicts future agent invocations to optimize KV-cache management in dynamic workflows.
  • The system employs hierarchical eviction and conservative prefetching to enhance cache reuse and mitigate prediction errors.
  • PBKV achieves up to 1.85× speedup over LRU and 1.26× over KVFlow in experimental benchmarks.
  • The predictor's performance is robust to errors, ensuring graceful degradation.
Read more
Delta-Based Neural Architecture Search: LLM Fine-Tuning via Code Diffs
Santosh Premi Adhikari, Radu Timofte, Dmitry Ignatov
Computer Vision Large Language Models Efficient ML
  • Introduction of Delta-Code Generation for NAS, focusing on generating compact diffs instead of full models.
  • Significant reduction in output length (75-85%) while preserving competitive accuracy metrics.
  • Evaluation of three different LLMs across six diverse datasets, showcasing the robustness of the approach.
  • Demonstration of the method's ability to maintain structural integrity and improve existing architectures.
Read more
MDN: Parallelizing Stepwise Momentum for Delta Linear Attention
Yulong Huang, Xiang Liu, Hongxiang Huang, Xiaopeng Lin, Zunchang Liu, Xiaowen Chu, Zeke Xie, Bojun Cheng
NLP Large Language Models Optimization
  • MDN introduces a chunkwise parallel algorithm that preserves causality while enhancing training efficiency.
  • The model leverages stepwise momentum to improve representation robustness and retrieval performance.
  • Extensive experiments show MDN outperforms existing models like Transformers, Mamba2, and GDN.
  • The paper provides a novel perspective on momentum-based updates as a second-order dynamical system.
Read more
Can Attribution Predict Risk? From Multi-View Attribution to Planning Risk Signals in End-to-End Autonomous Driving
Le Yang, Ruoyu Chen, Haijun Liu, Jiawei Liang, ShangQuan Sun, Xiaochun Cao
Computer Vision Robotics Interpretability
  • Attribution can be used as a predictive signal for planning risk in autonomous driving.
  • A novel coarse-to-fine attribution method is proposed for analyzing multi-view inputs.
  • Three statistics derived from attribution maps effectively quantify decision-level risks.
  • Experiments show strong correlation between attribution statistics and planning risks.
Read more
RVPO: Risk-Sensitive Alignment via Variance Regularization
Ivan Montero, Tomasz Jurczyk, Bhuwan Dhingra
Reinforcement Learning Large Language Models Optimization
  • RVPO addresses constraint neglect in multi-objective RLHF by penalizing inter-reward variance.
  • The LogSumExp operator is shown to effectively act as a smooth variance penalty.
  • RVPO improves performance on HealthBench and maintains accuracy on GPQA-Diamond without late-stage degradation.
  • The framework is validated across multiple reward signals and tool-calling scenarios.
Read more
Hypothesis generation and updating in large language models
Hua-Dong Xiong
Large Language Models NLP Theory
  • LLMs exhibit a two-parameter Bayesian fit but with systematic biases favoring narrower hypotheses.
  • A strong-sampling assumption leads to an implicit Occam's razor effect in hypothesis generation.
  • There is a robust evaluation-generation gap, with LLMs selecting more accurate hypotheses during evaluation than during generation.
  • LLMs generalize poorly to hypothesis domains not covered by observed examples, indicating limitations in their inference capabilities.
Read more
Towards Scalable One-Step Generative Modeling for Autoregressive Dynamical System Forecasting
Tianyue Yang, Xiao Xue
Generative Models Time Series Efficient ML
  • Introduction of MeLISA, a stochastic autoregressive surrogate model that requires only one function evaluation per forecast block.
  • Development of Window-Consistency MeanFlow for non-trivial one-step generative forecasting using masked temporal context.
  • Implementation of Time Increment Consistency to enforce long-horizon temporal correlations and mixing behavior.
  • Demonstrated superior performance of MeLISA on high-resolution benchmarks compared to traditional neural operators.
Read more
Validity-Calibrated Reasoning Distillation
Khouloud Saadi, Di Wang
NLP Large Language Models Efficient ML
  • VCRD treats reasoning distillation as local learning-signal allocation rather than trajectory imitation.
  • The framework evaluates the local validity of teacher and student proposals to modulate distillation updates.
  • VCRD preserves teacher guidance while adapting supervision to the quality of local reasoning.
  • The method shows improved performance across mathematical reasoning, code generation, and instruction-following tasks.
Read more
Knowledge-Free Correlated Agreement for Incentivizing Federated Learning
Leon Witt, Togrul Abbasli, Kentaroh Toyoda, Wojciech Samek, Lucy Klinger
Federated Learning Theory Efficient ML
  • KFCA provides a knowledge-free incentive mechanism for federated learning, avoiding the need for ground truth.
  • The mechanism is strongly truthful under a categorical-world condition, mitigating vulnerabilities present in previous methods.
  • KFCA enables real-time reward computation, making it applicable to decentralized and blockchain-based FL systems.
  • Empirical evaluations show KFCA significantly reduces reward computation costs compared to traditional methods like Shapley value estimators.
Read more
Directional Consistency as a Complementary Optimization Signal: The GONO Framework
Victor Daniel Gera
Optimization
  • Identifies the direction-loss decoupling phenomenon in optimization, where directional consistency does not guarantee loss convergence.
  • Introduces GONO, an optimizer that adapts momentum based on directional alignment, improving performance in oscillation detection.
  • Proves GONO matches Adam's convergence rate while providing a more effective mechanism for handling gradient direction.
  • Empirical validation shows GONO's effectiveness on standard datasets like MNIST and CIFAR-10.
Read more
Ensuring Reliability in Programming Knowledge Tracing: A Re-evaluation of Attention-augmented Models and Experimental Protocols
Jaewook Kim, Hyeoncheol Kim
Theory
  • Empirical performance gains in PKT models can be sensitive to implementation details and experimental design.
  • Improper ordering of student attempts can lead to data leakage and inflated performance estimates.
  • A controlled evaluation protocol is proposed to ensure consistent and fair benchmarking of PKT models.
  • The performance advantage of attention-augmented models is diminished under controlled settings.
Read more
A Regulatory Governance Framework for AI-Driven Financial Fraud Detection in U.S. Banking: Integrating OCC, SR 11-7, CFPB, and FinCEN Compliance Requirements for Model Development, Validation, and Monitoring Lifecycles
Mohammad Nasir Uddin
Interpretability
  • Introduces the RGF-AFFD framework integrating multiple regulatory compliance requirements for AI in fraud detection.
  • Demonstrates the performance of an LSTM+XGBoost ensemble model with a ROC-AUC of 0.9289.
  • Addresses the critical gap in existing literature regarding unified regulatory compliance for AI models in finance.
  • Provides a Regulatory Digital Twin meta-model for continuous compliance monitoring.
Read more
Diversity Curves for Graph Representation Learning
Katharina Limbeck, Nadja Häusermann, Martin Carrasco, Guy Wolf, Bastian Rieck
Graph Learning
  • Introduction of diversity curves for size-aware graph representation learning.
  • Demonstration of improved expressivity through edge contraction coarsening.
  • Diversity curves outperform traditional methods in various applications.
  • Method provides interpretable and scalable graph representations.
Read more
Graph-SND: Sparse Aggregation for Behavioral Diversity in Multi-Agent Reinforcement Learning
Shawn Ray
Reinforcement Learning Graph Learning Theory
  • Graph-SND provides a sparse aggregation method for measuring behavioral diversity in MARL.
  • It reduces the computational cost of SND from quadratic to linear or constant time, depending on the graph structure used.
  • The method maintains the semantics of SND while enabling efficient diversity control.
  • Empirical results show significant improvements in metric computation time and diversity tracking.
Read more
Memory Inception: Latent-Space KV Cache Manipulation for Steering LLMs
Andy Zeyi Liu, Michael Zhang, Ilana Greenberg, Adam Alnasser, Lucas Baker, John Sous
NLP Large Language Models
  • Memory Inception (MI) is a training-free method for steering LLMs using latent KV banks.
  • MI provides a better control-drift trade-off compared to traditional prompting and outperforms CAA.
  • The method allows for mid-conversation behavior shifts without rewriting the visible transcript.
  • MI improves performance on structured reasoning tasks while significantly reducing KV storage needs.
Read more
Matrix-Decoupled Concentration for Autoregressive Sequences: Dimension-Free Guarantees for Sparse Long-Context Rewards
Pei-Sen Li
Large Language Models Reinforcement Learning Theory
  • Introduction of the Matrix-Decoupled Concentration (MDC) framework to address concentration bounds in autoregressive sequences.
  • Establishment of a McDiarmid-type inequality that prevents scalar collapse and guarantees dimension-free variance proxies.
  • Demonstration of the framework's ability to recover optimal transport constants for Markov chains and establish bounds for causal trees.
  • Proof of stability in long-context generation for LLMs by preserving coordinate-wise sparsity of sensitivity vectors.
Read more
Bilinear Mamba-Koopman Neural MPC for Varying Dynamics
Matan Pagi, Zohar Sorek
Optimization Reinforcement Learning Robotics
  • Introduces Bilinear Mamba-Koopman Neural MPC to address limitations of existing Koopman-based models.
  • Allows for control-dependent coupling in latent dynamics, enhancing adaptability to time-varying conditions.
  • Maintains convexity while adding minimal parameters and enabling efficient Sequential Convex Programming.
  • Empirical results show improved forecasting accuracy and training stability in time-varying environments.
Read more
Diffusion-Based Posterior Sampling: A Feynman-Kac Analysis of Bias and Stability
Matias G. Delgadino, Sebastien Motsch, Advait Parulekar, William Porteous, Sanjay Shakkottai
Generative Models Theory Optimization
  • Introduces a Feynman-Kac framework to analyze bias and stability in diffusion-based posterior samplers.
  • Derives an exact bias formula for DPS, identifying regions of over- and under-sampling.
  • Reinterprets STSL as a corrective measure to reduce bias by guiding trajectories toward low-uncertainty areas.
  • Quantifies the instability in low-temperature regimes and characterizes early guidance-stopping as a heuristic.
Read more
Exact Dual Geometry of SOC-ICNN Value Functions
Kang Liu, Jianchen Hu, Wei Peng
Optimization Theory
  • SOC-ICNNs provide an exact value-function representation for second-order cone programs.
  • The dual viewpoint allows for the recovery of geometric properties such as subgradients and local curvature.
  • The paper presents a structured readout mechanism for extracting first-order information from dual solutions.
  • Numerical experiments confirm the effectiveness of the proposed methods and their applicability in real-world scenarios.
Read more
Adaptive Learning Strategies for AoA-Based Outdoor Localization: A Comprehensive Framework
Bac Trinh-Nguyen, Sara Berri, Sin G. Teo, Tram Truong-Huu, Arsenia Chorti
Theory Optimization Efficient ML
  • Proposes an adaptive framework for AoA-based localization suitable for varying dataset sizes.
  • Achieves 100% accuracy in distinguishing LoS and NLoS regions using a hierarchical offline learning approach.
  • Implements an online learning strategy that maintains high accuracy with small datasets and low forgetting rates.
  • Demonstrates the potential for robust localization in outdoor wireless environments with low-latency solutions.
Read more
HUGO-CS: A Hybrid-Labeled, Uncertainty-Aware, General-Purpose, Observational Dataset for Cold Spray
Stephen Price, Kyle Miller, Marco Musto, Kenneth Kroenlein, James Saal, Kyle Tsaknopoulos, Elke A. Rundensteiner, Danielle L. Cote
Optimization
  • HUGO-CS contains 4,383 cold-spray experiments, a 30x increase over previous datasets.
  • The HUGO framework combines LLM-based automated extraction with manual refinement for accuracy.
  • A Hierarchical Risk Mitigation strategy is implemented to balance labeling efficiency and accuracy.
  • The dataset includes extensive post-processing to standardize and normalize data for usability.
Read more
On the Architectural Complexity of Neural Networks
Nicholas J. Cooper, François G. Meyer, Michael L. Roberts, Carlos Zapata-Carratalá, Lijun Chen, Danna Gurari
Theory Efficient ML
  • Introduces a hierarchical combinatorial framework for neural networks that models tensor operations explicitly.
  • Analyzes the evolution of architectural complexity in DNNs over the past 40 years, revealing connections between architecture breakthroughs and complexity increases.
  • Identifies unexplored classes of higher complexity architectures and provides a dataset of 3,028 novel architectures.
  • Demonstrates that new architectures can achieve high efficiency with fewer parameters compared to existing models.
Read more
Quantizing With Randomized Hadamard Transforms: Efficient Heuristic Now Proven
Ran Ben-Basat, William Kuszmaul, Michael Mitzenmacher, Amit Portnoy, Shay Vargaftik
Theory Efficient ML Federated Learning
  • Two RHTs provide a uniform O(d−1/2) approximation to Gaussian distributions for scalar quantization.
  • The performance of modern quantization schemes like DRIVE and QUIC-FL can be improved using RHTs.
  • Three RHTs are necessary for effective Vector Quantization to ensure weak correlation among coordinate blocks.
  • A linear-time check allows for dynamic adjustment of RHT usage based on input characteristics.
Read more