AI-generated summaries
Today's ML research,
without the noise.
Daily summaries of the latest machine learning papers from arXiv, processed every 8 hours.
40
Papers today
8h
Update frequency
7
Days of history
Unsupervised Learning of Inter-Object Relationships via Group Homomorphism
Computer Vision
Theory
Robotics
- Introduces an unsupervised learning method based on group homomorphism to model inter-object relationships.
- Demonstrates the ability to segment multiple objects and extract motion laws without ground-truth labels.
- Highlights the importance of algebraic geometric constraints for creating interpretable representations.
- Aims to replicate cognitive development processes observed in preverbal infants.
Read more
Unsupervised Learning of Inter-Object Relationships via Group Homomorphism
Summary
This paper presents a novel unsupervised representation learning method that leverages group homomorphism to model inter-object relationships, aiming to replicate the cognitive development processes observed in preverbal infants. Unlike traditional deep learning models that rely on statistical correlations, the proposed method focuses on the underlying structures of the world, enabling the model to autonomously acquire knowledge from limited experiences. The architecture integrates object segmentation and motion law extraction from dynamic image sequences, allowing for the structural separation of pixel-level changes into meaningful transformation components such as translation and deformation. The authors demonstrate the effectiveness of their approach through experiments involving interaction scenes, where the model successfully segments multiple objects without ground-truth labels and accurately maps their relative movements into a one-dimensional additive latent space. This work highlights the potential of algebraic geometric constraints in achieving physically interpretable disentangled representations, contributing to the understanding of how infants internalize environmental laws and offering insights for developing artificial systems with developmental intelligence.
Methodology
The proposed method consists of three main steps: (1) Object Segmentation to generate masks isolating individual objects from image sequences, (2) Separation of Object Motion using group homomorphism constraints to distinguish motion components like translation and deformation, and (3) Extraction of Multi-Object Interactions by relativizing the motion of each object to extract underlying interactions.
Results
The model successfully segments multiple objects and accurately maps their relative movements into a structured latent space, demonstrating the effectiveness of using algebraic constraints over traditional statistical learning methods.
Implications
This research suggests that incorporating algebraic structures into representation learning can enhance the flexibility and interpretability of AI systems, potentially leading to advancements in cognitive robotics and artificial intelligence that better mimic human learning processes.
Task-specific Subnetwork Discovery in Reinforcement Learning for Autonomous Underwater Navigation
Reinforcement Learning
Robotics
Interpretability
- MTRL effectively utilizes shared knowledge across tasks, indicating successful knowledge sharing.
- Only a small fraction of network weights are task-specific, suggesting minimal specialization needed for individual tasks.
- Context variables play a crucial role in enabling task differentiation in MTRL.
Read more
Task-specific Subnetwork Discovery in Reinforcement Learning for Autonomous Underwater Navigation
Summary
This paper addresses the challenges faced by autonomous underwater vehicles (AUVs) in navigating complex environments using reinforcement learning (RL). Traditional control methods struggle with dynamic conditions and limited sensing, necessitating robust and interpretable control policies. The authors explore multi-task reinforcement learning (MTRL) to leverage shared representations for efficient adaptation across tasks. However, existing MTRL approaches lack transparency, hindering real-world deployment. The study investigates a pretrained MTRL network in the HoloOcean simulator, identifying task-specific subnetworks that differentiate navigation tasks for various marine species. The findings reveal that only 1.5% of the network's weights are used for task differentiation, with 85% of these weights connecting context variables to the hidden layers. This highlights the significance of context in MTRL. The research contributes to understanding shared and specialized components in neural networks, facilitating model editing, transfer learning, and continual learning for underwater monitoring.
Methodology
The authors employed a pretrained Double DQN value network for underwater navigation, pruning the network to obtain task-specific subnetworks. They analyzed the overlap between these subnetworks to understand shared and task-specific components across different navigation tasks, initially testing their methodology in MiniGrid before applying it to the HoloOcean simulator.
Results
The study found that MTRL networks use a significant portion of their weights for shared knowledge, with only a small percentage dedicated to task-specific functions. The analysis confirmed the importance of context variables in differentiating tasks effectively, providing insights into the internal structure of contextual MTRL networks.
Implications
The findings have implications for improving the interpretability and reliability of AUV control models, enhancing their deployment in real-world scenarios. The insights gained can inform future research in transfer learning and the development of more efficient and robust underwater navigation policies.
Quotient-Space Diffusion Models
Generative Models
- Introduces a formal framework for diffusion modeling on quotient spaces.
- Simplifies learning by treating equivalent objects as a single entity.
- Reduces the necessity of learning group actions, enhancing model efficiency.
- Empirical results show improved performance in molecular structure generation.
Read more
Quotient-Space Diffusion Models
Summary
This paper introduces a novel framework for diffusion-based generative models that accounts for intrinsic symmetries in scientific applications, particularly in molecular structure generation. Traditional diffusion models often struggle with the inherent symmetry of tasks, such as the equivalence of molecular structures under transformations like rotation and translation. The authors propose a quotient-space diffusion model that simplifies the learning process by treating equivalent objects as a single entity, thereby reducing the complexity of learning the group actions associated with these symmetries. The framework is built on the mathematical concept of quotient spaces, allowing for a diffusion process that effectively projects updates onto subspaces that do not alter the intrinsic properties of the objects being modeled. This approach not only guarantees the recovery of the target distribution but also alleviates the need for the model to learn unnecessary movements within equivalence classes. Empirical validation demonstrates that the proposed model outperforms existing symmetry treatments in generating molecular structures, showcasing its effectiveness in scientific applications.
Methodology
The authors derive a diffusion process on a general quotient space and utilize horizontal lift to simulate this process in the original space. This method allows for effective projection of updates onto subspaces that do not induce movements within equivalence classes, thus simplifying the learning process.
Results
The proposed quotient-space diffusion model was empirically validated on small molecules and proteins, demonstrating superior performance compared to conventional group-equivariant diffusion models. The results indicate that the new framework effectively captures the target distribution while reducing learning complexity.
Implications
This work has significant implications for generative modeling in scientific domains, particularly in areas requiring the generation of structures with inherent symmetries, such as molecular design and material science. The framework could potentially enhance the efficiency and accuracy of generative models in various scientific applications.
Even More Guarantees for Variational Inference in the Presence of Symmetries
Theory
Optimization
- Derives sufficient conditions for exact recovery of the mean using FKL and α-divergences.
- Extends previous results on robust variational inference under target symmetries.
- Provides guidelines for choosing appropriate variational families based on theoretical insights.
- Highlights potential optimization failures when sufficient conditions are not met.
Read more
Even More Guarantees for Variational Inference in the Presence of Symmetries
Summary
This paper extends the theoretical foundations of variational inference (VI) under the presence of symmetries in target distributions. The authors build upon previous work that established conditions for the exact recovery of statistical characteristics, specifically the mean, when using location-scale families as variational approximations. They derive sufficient conditions for the forward Kullback-Leibler (FKL) divergence and α-divergences to ensure exact recovery of the mean, addressing gaps in existing literature that primarily focused on the reverse Kullback-Leibler divergence (RKL). The paper discusses how these conditions can inform the selection of variational families and highlights potential pitfalls in optimization when these conditions are not satisfied. The findings contribute to a deeper understanding of how to effectively utilize variational inference in practical applications, particularly when dealing with misspecified models.
Methodology
The authors utilize theoretical analysis to derive conditions under which variational inference can accurately recover the mean of a target distribution. They build on existing literature regarding f-divergences, particularly focusing on FKL and α-divergences, and establish new sufficient conditions for these divergences to yield unique minimizers.
Results
The paper presents complementary sufficient conditions for the exact recovery of the mean when optimizing the FKL and α-divergences. For the FKL, mild assumptions on the base distribution are sufficient, while for the α-divergence, a more detailed criterion is provided that depends on the value of α. These results enhance the understanding of variational inference in the context of symmetries and misspecification.
Implications
The findings have significant implications for practitioners using variational inference in machine learning and statistics, particularly in scenarios where the target distribution is not well-represented by the chosen variational family. The theoretical guarantees can aid in the design of more robust inference methods and improve the reliability of statistical estimates derived from variational approaches.
A Green-Integral-Constrained Neural Solver with Stochastic Physics-Informed Regularization
Theory
Efficient ML
Optimization
- Introduction of a Green-Integral neural solver for the Helmholtz equation.
- Elimination of second-order spatial derivatives and boundary layers through integral representation.
- Significant reduction in computational cost and training time via FFT-based convolution.
- Hybrid GI+PDE loss improves accuracy in strong scattering regions.
Read more
A Green-Integral-Constrained Neural Solver with Stochastic Physics-Informed Regularization
Summary
This paper addresses the limitations of standard Physics-Informed Neural Networks (PINNs) in simulating oscillatory Helmholtz solutions in heterogeneous media. The authors propose a novel Green-Integral (GI) neural solver that utilizes an integral representation to enforce wave physics, thereby avoiding the computational challenges associated with second-order PDE residuals. This approach allows for direct encoding of oscillatory behavior and outgoing radiation, eliminating the need for artificial boundary layers. The optimization of the GI loss via a neural network functions as a spectrally tuned preconditioned iteration, enhancing convergence in complex media. The method employs FFT-based convolution to accelerate loss evaluation, significantly reducing GPU memory usage and training time. To enhance local accuracy in regions with strong scattering, a hybrid GI+PDE loss is introduced, which combines the GI approach with a lightweight Helmholtz residual at selectively sampled collocation points. The proposed method is evaluated on seismic benchmark models, demonstrating consistent performance improvements over traditional PDE-based PINNs, achieving over tenfold reductions in computational costs. The hybrid loss approach yields the most accurate reconstructions in scenarios with localized scattering, establishing a stable and efficient alternative for wavefield modeling.
Methodology
The authors developed a Green-Integral neural solver that utilizes an integral representation to enforce wave physics, avoiding the traditional PDE-residual-based formulation. The optimization of the GI loss is performed using a neural network, and FFT-based convolution is employed to enhance efficiency. A hybrid loss function is also introduced to improve local accuracy in regions with strong scattering.
Results
The GI-based training consistently outperformed traditional PDE-based PINNs, achieving over a tenfold reduction in computational costs. The hybrid loss approach provided the most accurate reconstructions in models with localized scattering, demonstrating the effectiveness of the proposed method.
Implications
This work has significant implications for wavefield modeling in various applications, including seismic imaging, medical ultrasound, and acoustics, by providing a more efficient and accurate method for simulating wave propagation in complex media.
Transferable Physics-Informed Representations via Closed-Form Head Adaptation
Theory
Optimization
Efficient ML
- Introduction of Pi-PINN, a framework for transferable physics-informed representations.
- Utilization of closed-form head adaptation to reduce computational costs in adapting to new PDE instances.
- Demonstrated synergy between data-driven multi-task learning and physics-informed losses.
- Empirical results show significant speed and accuracy improvements over traditional PINNs.
Read more
Transferable Physics-Informed Representations via Closed-Form Head Adaptation
Summary
This paper introduces a novel approach to enhance the generalization and efficiency of Physics-Informed Neural Networks (PINNs) through a framework called Pseudoinverse PINN (Pi-PINN). Traditional PINNs, while effective in solving partial differential equations (PDEs), often struggle with generalization to new instances and require extensive retraining. The proposed Pi-PINN framework addresses these limitations by learning transferable physics-informed representations in a shared embedding space. This is achieved through closed-form head adaptation using a least-squares-optimal pseudoinverse under PDE constraints. The authors explore the integration of data-driven multi-task learning losses with physics-informed losses, leading to improved performance. Empirical results demonstrate that Pi-PINN can achieve predictions 100–1000 times faster than conventional PINNs, with 10–100 times lower relative error compared to typical data-driven models, even with minimal training data. The findings suggest that Pi-PINN significantly enhances the adaptability and efficiency of PINNs across various PDE problems, making it a promising tool for scientific and engineering applications.
Methodology
The Pi-PINN framework decouples the learning process into a shared embedding for transferable structures across related PDE instances and a task-specific output head that can be adapted using a closed-form linear solve. This approach minimizes the need for gradient-based re-optimization for new instances. The authors also analyze the combination of multi-task learning objectives with physics-informed residuals to enhance model performance.
Results
Pi-PINN achieved predictions 100–1000 times faster than conventional PINNs and produced results with 10–100 times lower relative error than typical data-driven models, even when trained on as few as two samples. The framework was validated on various PDE problems, including Poisson’s equation, Helmholtz equation, and Burgers’ equation.
Implications
The findings suggest that Pi-PINN can significantly improve the efficiency and generalization of PINNs, making them more applicable in real-world scenarios where rapid adaptation to new PDE instances is necessary. This has potential implications for various fields in science and engineering that rely on solving complex PDEs.
A Deep U-Net Framework for Flood Hazard Mapping Using Hydraulic Simulations of the Wupper Catchment
Efficient ML
- Development of a deep learning surrogate model for flood prediction using U-Net architecture.
- Significant reduction in computation time compared to traditional hydraulic simulations.
- Validation of the model using real hydraulic simulation data from the Wupper catchment.
- Demonstration of the model's ability to generalize across different topographies.
Read more
A Deep U-Net Framework for Flood Hazard Mapping Using Hydraulic Simulations of the Wupper Catchment
Summary
This paper addresses the urgent need for rapid and reliable flood prediction tools in light of increasing global flood events. Traditional hydraulic simulations, while accurate, are computationally expensive and slow, making them unsuitable for real-time applications. The authors propose a deep-learning-based surrogate model utilizing a U-Net architecture to predict maximum water levels across a grid, effectively approximating hydraulic models. The framework was validated using hydraulic simulations from the Wupper catchment in North-Rhine Westphalia, Germany, demonstrating that the deep learning model can provide comparable accuracy to traditional methods while significantly reducing computation time. This research highlights the potential of deep learning to enhance flood hazard mapping and improve emergency response capabilities.
Methodology
The authors optimized a U-Net architecture for flood prediction by conducting experiments on patch generation and data handling. The model was trained on hydraulic simulation data to learn the relationship between hydraulic features and water levels, allowing for rapid predictions of flood extents.
Results
The deep learning surrogate model achieved results comparable to traditional hydraulic simulations while being significantly faster, demonstrating its potential as a viable alternative for real-time flood hazard mapping.
Implications
This research could lead to the development of more efficient flood prediction systems that can be deployed in emergency situations, ultimately improving disaster response and risk management strategies in flood-prone areas.
Preconditioned DeltaNet: Curvature-aware Sequence Modeling for Linear Recurrences
NLP
Large Language Models
Efficient ML
- Introduction of preconditioning to delta-rule recurrences enhances optimization by accounting for curvature in least-squares loss.
- Derivation of equivalences between linear attention and DeltaNet in the context of preconditioning.
- Development of efficient chunkwise parallel algorithms for Preconditioned Linear Attention (PLA) and Preconditioned DeltaNet (PDN).
- Empirical improvements in performance on synthetic tasks and language modeling at large scales.
Read more
Preconditioned DeltaNet: Curvature-aware Sequence Modeling for Linear Recurrences
Summary
This paper addresses the limitations of softmax attention in long-context computations by introducing preconditioned variants of recurrent models, specifically DeltaNet, Gated DeltaNet (GDN), and Kimi Delta Attention (KDA). The authors propose a curvature-aware approach to sequence modeling through the lens of online least squares, which allows for the incorporation of second-order information into the updates of these models. By deriving equivalences between linear attention and DeltaNet under exact preconditioning, they develop efficient chunkwise parallel algorithms for these preconditioned models. The empirical results demonstrate that the preconditioned delta-rule recurrences consistently outperform existing methods on synthetic recall benchmarks and language modeling tasks at scales of 340M and 1B parameters, indicating the effectiveness of their approach in improving performance while maintaining computational efficiency.
Methodology
The authors leverage the theory of online least squares to derive preconditioned updates for delta-rule recurrences. They implement a diagonal approximation for preconditioning, allowing for efficient computation and training of the models. The methodology includes chunkwise parallel forms that enhance training throughput while maintaining the benefits of the proposed curvature-aware updates.
Results
The experiments show that preconditioned delta-rule recurrences yield consistent performance improvements across various benchmarks, particularly in synthetic recall tasks and language modeling at the 340M and 1B parameter scales. The results indicate that incorporating curvature information leads to better optimization and model performance.
Implications
The findings suggest that incorporating curvature-aware updates in sequence modeling can significantly enhance the performance of large language models and other applications requiring long-context processing. This approach may lead to more efficient training and inference in real-world applications, particularly in NLP tasks.
Temporal Taskification in Streaming Continual Learning: A Source of Evaluation Instability
Time Series
- Temporal taskification is a structural component of evaluation in streaming CL.
- Different valid splits of the same data stream can lead to different CL regimes.
- The proposed framework allows for efficient diagnosis of taskification robustness before training.
- Shorter taskifications result in noisier patterns and greater sensitivity to boundary perturbations.
Read more
Temporal Taskification in Streaming Continual Learning: A Source of Evaluation Instability
Summary
This paper investigates the impact of temporal taskification in Streaming Continual Learning (CL), arguing that the way continuous data streams are partitioned into discrete tasks significantly influences evaluation outcomes. The authors introduce a taskification-level framework that utilizes plasticity and stability profiles, profile distance, and Boundary-Profile Sensitivity (BPS) to assess the robustness of different task splits before training models. Through experiments on the CESNET-Timeseries24 dataset, they demonstrate that varying the temporal taskification leads to substantial differences in forecasting errors, forgetting, and backward transfer across several CL methods. The findings highlight that the choice of taskification is not merely a preprocessing step but a critical structural component that can alter benchmark conclusions, thus advocating for its recognition as a first-class evaluation variable in CL research.
Methodology
The authors developed a taskification-level framework that includes plasticity and stability profiles, profile distance, and Boundary-Profile Sensitivity (BPS) to analyze the effects of different temporal taskifications on CL performance. They conducted experiments using continual finetuning, Experience Replay, Elastic Weight Consolidation, and Learning without Forgetting on the CESNET-Timeseries24 dataset, varying the temporal splits while keeping other factors constant.
Results
The experiments revealed that different temporal taskifications (9-, 30-, and 44-day splits) led to significant variations in forecasting errors, forgetting rates, and backward transfer. Shorter taskifications were associated with higher profile distances and BPS, indicating increased sensitivity to changes in task boundaries. These results underscore the importance of taskification in evaluating CL methods.
Implications
The findings suggest that researchers in continual learning should carefully consider how they partition data streams into tasks, as this can fundamentally alter the perceived effectiveness of different learning strategies. This work encourages the development of more robust evaluation protocols that account for the variability introduced by taskification.
Fairness under uncertainty in sequential decisions
Reinforcement Learning
Theory
Optimization
- Introduces a taxonomy of uncertainty in sequential decision-making.
- Formalizes model and feedback uncertainty using counterfactual logic and reinforcement learning.
- Demonstrates the potential harms of ignoring unobserved outcomes in decision-making.
- Shows that uncertainty-aware exploration can improve fairness metrics.
Read more
Fairness under uncertainty in sequential decisions
Summary
This paper addresses the challenge of ensuring fairness in machine learning (ML) within the context of sequential decision-making, where decisions are made under uncertainty and feedback from prior decisions influences future choices. The authors introduce a taxonomy of uncertainty in sequential decision-making, categorizing it into model uncertainty, feedback uncertainty, and prediction uncertainty. They formalize these uncertainties using counterfactual logic and reinforcement learning techniques. The paper highlights the potential harms of naïve decision-making policies that overlook unobserved outcomes, particularly for historically marginalized groups. Through algorithmic examples, the authors demonstrate how to reduce outcome variance for disadvantaged groups while maintaining the decision maker's institutional objectives. The experiments conducted reveal that unequal uncertainty and selective feedback can lead to disparities in outcomes, and that uncertainty-aware exploration can improve fairness metrics. The framework proposed equips researchers and practitioners with tools to diagnose and govern fairness risks in sequential decision systems, emphasizing the need to account for uncertainty in fair decision-making.
Methodology
The authors utilize counterfactual logic and reinforcement learning techniques to formalize uncertainties in sequential decision-making. They conduct experiments on simulated data to illustrate the effects of unequal uncertainty and selective feedback on fairness outcomes.
Results
The results indicate that naïve policies can lead to compounding exclusion and reduced access for marginalized groups. The framework allows for a simultaneous reduction in outcome variance for these groups while preserving the decision maker's expected utility. The experiments show that accounting for uncertainty can significantly alter observed fairness metrics.
Implications
This research has significant implications for high-stakes decision-making in areas such as finance, healthcare, and criminal justice, where fairness is critical. It provides a structured approach for practitioners to assess and mitigate fairness risks in sequential decision systems, ultimately contributing to more equitable outcomes.
The Path Not Taken: Duality in Reasoning about Program Execution
Large Language Models
- Current benchmarks for LLMs in program execution are limited and prone to data contamination.
- The concept of duality in reasoning about program execution introduces forward and backward reasoning tasks.
- DEXBENCH, the proposed benchmark, evaluates LLMs on both execution and counterfactual reasoning.
- Dual-path reasoning provides a more reliable measure of causal understanding in program execution.
Read more
The Path Not Taken: Duality in Reasoning about Program Execution
Summary
This paper addresses the limitations of current benchmarks for evaluating large language models (LLMs) in the context of program execution. Existing benchmarks primarily focus on predicting program properties based on specific inputs, which can lead to a narrow understanding of dynamic code reasoning and potential data contamination. The authors propose a duality in reasoning about program execution, introducing two complementary tasks: predicting a program's behavior for a given input (forward reasoning) and inferring how to mutate the input to achieve a specific behavioral objective (backward reasoning). This dual-path reasoning is operationalized in a new benchmark called DEXBENCH, which consists of 445 paired instances. The evaluation of 13 LLMs reveals that dual-path reasoning serves as a robust proxy for understanding dynamic code execution, highlighting that strong performance in isolated tasks does not guarantee success in joint evaluations. The findings suggest that a deeper causal understanding of execution flow is essential for improving LLM performance in software engineering tasks.
Methodology
The authors developed DEXBENCH, a benchmark that consists of 445 paired instances designed to evaluate LLMs on two reasoning tasks: forward reasoning (predicting observed behavior) and backward reasoning (inferring input mutations for alternative execution paths). The benchmark was constructed from real-world programs and evaluated across 13 LLMs, including both open-source and proprietary models.
Results
The evaluation revealed that dual-path reasoning is a strong indicator of a model's causal understanding of program execution. It was found that models that performed well on isolated tasks did not necessarily excel in joint evaluations, indicating the limitations of traditional single-path benchmarks.
Implications
The findings suggest that enhancing LLMs' understanding of program execution through dual-path reasoning could improve their applicability in software engineering tasks, such as code generation, debugging, and program synthesis. This approach may lead to more reliable and robust AI systems in programming contexts.
TRAVELFRAUDBENCH: A Configurable Evaluation Framework for GNN Fraud Ring Detection in Travel Networks
Graph Learning
- TFG introduces a configurable framework for evaluating GNNs in fraud detection specific to travel networks.
- The framework simulates three distinct fraud ring types with a heterogeneous graph structure.
- GraphSAGE and RGCN-proj significantly outperform traditional MLP methods in fraud detection tasks.
- Detection capabilities vary across different fraud ring topologies, highlighting the need for tailored approaches.
Read more
TRAVELFRAUDBENCH: A Configurable Evaluation Framework for GNN Fraud Ring Detection in Travel Networks
Summary
This paper introduces TRAVELFRAUDBENCH (TFG), a novel evaluation framework designed to assess the effectiveness of graph neural networks (GNNs) in detecting fraud rings within travel networks. Unlike existing benchmarks that focus on single node types and edge relations, TFG simulates three distinct types of travel-related fraud rings—ticketing fraud, ghost hotel schemes, and account takeover rings—within a heterogeneous graph comprising nine node types and twelve edge relations. The framework allows for configurable parameters such as ring size, count, fraud rate, and type composition, facilitating controlled evaluation studies. The authors evaluate six detection methods, including MLP, GraphSAGE, and RGCN, demonstrating that GNNs significantly outperform traditional tabular methods in terms of area under the curve (AUC) and average precision (AP). The results indicate that GraphSAGE achieves the highest AUC of 0.992, confirming the importance of graph structure in fraud detection. The paper also highlights the unique detection challenges posed by different fraud ring topologies and provides insights into the primary discriminative signals for detection. TFG is released as an open-source Python package, contributing to the field by providing a comprehensive tool for evaluating GNN performance in fraud detection.
Methodology
The authors developed a synthetic graph generator that creates a heterogeneous graph with multiple node and edge types, simulating various fraud ring structures. They evaluated six GNN architectures using a controlled train/validation/test split to prevent label leakage, focusing on metrics such as AUC and average precision to assess detection performance.
Results
GraphSAGE achieved an AUC of 0.992, outperforming the MLP baseline by 5.5 percentage points. RGCN-proj also performed well with an AUC of 0.987. The average precision improved significantly for GNNs compared to MLP, with GraphSAGE showing a 16.1 percentage point increase. In terms of ring recovery, GraphSAGE achieved 100% recovery across all ring types, while MLP only recovered 17-88%, underscoring the effectiveness of GNNs in this context.
Implications
The development of TFG has significant implications for the field of fraud detection, particularly in travel networks. It provides a robust framework for evaluating GNNs, which can lead to improved detection methods and strategies. The open-source nature of TFG encourages further research and application in various domains facing similar fraud challenges.
Interpretable Quantile Regression by Optimal Decision Trees
Interpretability
- Introduces Quantile DL8.5 (QDL8.5) for optimal quantile regression trees.
- Provides predictions for the complete conditional distribution without prior distribution assumptions.
- Enhances interpretability and robustness by learning multiple trees for different quantiles.
- Achieves high accuracy with minimal computational overhead compared to traditional methods.
Read more
Interpretable Quantile Regression by Optimal Decision Trees
Summary
This paper introduces a novel method for learning a set of optimal quantile regression trees, addressing the growing demand for machine learning models that are both accurate and interpretable. The proposed method, named Quantile DL8.5 (QDL8.5), allows for predictions of the complete conditional distribution of a target variable without requiring prior assumptions about its distribution. By learning multiple optimal trees for different quantiles, QDL8.5 enhances interpretability and robustness, particularly in applications where understanding the prediction process is crucial. The authors argue that traditional single quantile regression trees limit insights into the target distribution and may lead to suboptimal predictions. The QDL8.5 method efficiently learns these trees with minimal computational overhead compared to learning a single tree, thus maintaining algorithmic efficiency. The paper also provides a robust assessment of the method's accuracy, execution time, and interpretability, demonstrating its effectiveness in practical applications.
Methodology
The authors extend the DL8.5 algorithm to perform quantile regression by learning one optimal tree per quantile while exploring the tree space only once. This approach allows for efficient learning of multiple trees corresponding to different quantiles, thus providing a comprehensive view of the target distribution.
Results
The QDL8.5 method demonstrates high accuracy and interpretability in predictions, with a robust assessment showing that it can learn multiple quantile regression trees with virtually no increase in computational time compared to learning a single tree.
Implications
The proposed method has significant implications for fields requiring interpretable machine learning models, such as healthcare and business strategy, where understanding the reasoning behind predictions is as important as the predictions themselves. It allows for better decision-making by providing insights into the entire distribution of target variables.
Droplet-LNO: Physics-Informed Laplace Neural Operators for Accurate Prediction of Droplet Spreading Dynamics on Complex Surfaces
Theory
Efficient ML
Optimization
- Introduction of PI-LNO for modeling droplet dynamics on complex surfaces.
- Significant reduction in computational time compared to traditional CFD methods.
- Outperforms existing state-of-the-art models in accuracy and efficiency.
- Utilizes a physics-regularized loss function to ensure physically feasible predictions.
Read more
Droplet-LNO: Physics-Informed Laplace Neural Operators for Accurate Prediction of Droplet Spreading Dynamics on Complex Surfaces
Summary
This paper introduces the Physics-Informed Laplace Operator Neural Network (PI-LNO), a novel architecture designed to accurately predict the dynamics of liquid droplet spreading on complex surfaces. Traditional computational fluid dynamics (CFD) simulations are computationally expensive, often requiring 18 to 24 hours for transient computations. The PI-LNO leverages the Laplace integral transform to model the exponential transient dynamics of droplet spreading, significantly improving computational efficiency. The authors conducted extensive benchmark studies against five state-of-the-art methods, including UNet and DeepONet, demonstrating that PI-LNO achieves a mean R² of 0.9009 across various spreading times, outperforming the other models. The architecture employs a physics-regularized composite loss function that combines data fidelity metrics with physical constraints from the Navier-Stokes equations and Cahn-Hilliard model. Training on multi-surface CFD data, PI-LNO shows a remarkable speedup of approximately 23,400 times compared to traditional CFD methods, enabling real-time applications. The results indicate that PI-LNO can effectively model transient multiphase dynamics, making it a valuable tool for engineering applications requiring rapid simulations and optimizations.
Methodology
The PI-LNO architecture is trained using multi-surface CFD data, employing a physics-regularized composite loss function that integrates data fidelity metrics (MSE, MAE, RMSE) with physical constraints from the Navier-Stokes equations and Cahn-Hilliard model. The model captures the transient dynamics of droplet spreading through complex Laplace transforms, allowing for efficient representation of exponential decay and inertio-capillary oscillations.
Results
PI-LNO achieved a mean R² of 0.9009 across four intermediate spreading times, with absolute errors localized to contact-line regions. The model demonstrated an RMSE of 1.4731×10^-3 and inference times of 2.8 ms, representing a 23,400× speedup over traditional CFD simulations. It also delivered R² values exceeding 0.99 for all field variables, including velocity, pressure, and phase-field.
Implications
The PI-LNO framework provides a physics-aware surrogate for accelerated parametric optimization and multi-surface wettability design. Its ability to perform real-time simulations can significantly enhance the design and control of engineering systems where transient multiphase dynamics are critical, such as in inkjet printing and biomedical microfluidics.
HARBOR: Automated Harness Optimization
NLP
Large Language Models
Optimization
- Harness design is a critical aspect of deploying long-horizon language models, often overshadowing the model itself.
- HARBOR formalizes Automated Harness Optimization as a constrained noisy optimization problem.
- A case study reveals that manual tuning is often ineffective, with only one out of four rounds achieving a statistically significant improvement.
- The paper advocates for treating harness tuning as a hyper-parameter optimization problem.
Read more
HARBOR: Automated Harness Optimization
Summary
The paper presents HARBOR, a novel approach to Automated Harness Optimization (AHO) for long-horizon language-model agents. It argues that the design of the harness, which encompasses various operational complexities, is a critical machine-learning problem. The authors formalize AHO as a constrained noisy Bayesian optimization problem over a mixed-variable configuration space, introducing a reference algorithm, HARBOR, which utilizes a block-additive surrogate model and cost-aware acquisition strategies. A case study is conducted using a flag-gated harness in a production coding agent, demonstrating the limitations of manual tuning through a controlled four-round experiment. The results indicate that only one tuning round outperformed the baseline, highlighting the challenges of manual tuning and the necessity for rigorous automated optimization methods. The findings suggest that harness tuning should be treated with the same level of rigor as hyper-parameter optimization, emphasizing the need for systematic approaches in harness design.
Methodology
The authors formalize Automated Harness Optimization as a constrained noisy Bayesian optimization problem, utilizing a block-additive surrogate model and multi-fidelity cost-aware acquisition strategies. The HARBOR algorithm incorporates a posterior chance-constrained safety check and is tested through a case study involving a flag-gated harness in a production coding agent.
Results
In the four tuning rounds conducted, only the second round achieved a statistically credible improvement over the baseline. The results showed that additional features often led to decreased performance due to issues such as ineffective self-evaluation and integration bugs, underscoring the complexity of harness optimization.
Implications
The findings suggest that automated approaches to harness optimization can significantly enhance the performance of language-model agents, making it essential for teams to adopt systematic methods rather than relying on manual tuning. This could lead to more efficient and effective deployment of AI systems in various applications.
Dilated CNNs for Periodic Signal Processing: A Low-Complexity Approach
Time Series
Efficient ML
Audio & Speech
- R-DCNN offers a low-complexity solution for denoising periodic signals.
- The method requires only a single observation for training, enhancing efficiency.
- R-DCNN can generalize across signals with varying frequencies through resampling.
- Performance is comparable to classical autoregressive methods and conventional DCNNs.
Read more
Dilated CNNs for Periodic Signal Processing: A Low-Complexity Approach
Summary
This paper presents a novel approach for denoising periodic signals using a Dilated Convolutional Neural Network (DCNN) combined with a resampling technique, termed R-DCNN. The method is designed to operate efficiently under strict power and resource constraints, making it suitable for applications in IoT devices and other low-power environments. Unlike traditional deep learning methods that require extensive computational resources and separate training for each signal observation, R-DCNN is trained using a single observation and can generalize to other signals with varying fundamental frequencies through a lightweight resampling step. This allows the same network weights to be reused across different signals, significantly reducing computational complexity. The experimental results demonstrate that R-DCNN achieves performance comparable to state-of-the-art classical methods, such as autoregressive techniques, while maintaining low computational demands. This efficiency makes R-DCNN particularly advantageous for real-time signal processing tasks where resource limitations are a concern.
Methodology
The proposed R-DCNN utilizes dilated convolutions to capture long-range temporal dependencies in periodic signals. The model is trained on a single observation, and during inference, it employs a resampling technique to align the time scales of varying frequency signals, allowing the reuse of fixed network weights without retraining.
Results
The experiments show that R-DCNN achieves high accuracy in signal denoising while significantly reducing computational complexity compared to existing methods, including both deep learning DCNNs and classical autoregressive techniques. This demonstrates the method's effectiveness in processing periodic signals with varying periods.
Implications
The R-DCNN approach has significant implications for real-time signal processing applications in fields such as speech recognition, medical diagnostics, and IoT devices, where computational resources are limited. Its ability to maintain performance while reducing complexity opens up new possibilities for deploying deep learning techniques in edge computing environments.
FairyFuse: Multiplication-Free LLM Inference on CPUs via Fused Ternary Kernels
NLP
Large Language Models
Efficient ML
- FairyFuse is the first ternary-weight GEMV kernel on x86 CPUs that eliminates all floating-point multiplications.
- The system consolidates eight sub-GEMVs into a single SIMD-friendly loop, achieving a 1.55× speedup over unfused execution.
- Ternary packing shifts computational efficiency from GPUs to CPUs, making CPUs the preferred target for extreme quantization.
- FairyFuse achieves competitive throughput with 4-bit baselines while utilizing only 2-bit storage.
Read more
FairyFuse: Multiplication-Free LLM Inference on CPUs via Fused Ternary Kernels
Summary
The paper introduces FairyFuse, an innovative inference system designed for large language models (LLMs) that utilizes ternary weights to eliminate floating-point multiplications during execution on CPU-only platforms. Traditional inference systems often dequantize weights and perform multiplications, which limits the efficiency gains from weight quantization. FairyFuse addresses this by employing ternary weights, which can be represented as -1, 0, and +1, allowing weight-activation products to be computed using conditional additions and subtractions instead of multiplications. The authors detail a novel approach that fuses multiple real-valued sub-GEMVs into a single AVX-512 loop, ensuring that the inner loop contains no floating-point multiplications. This method significantly enhances performance on bandwidth-limited CPUs, achieving a 29.6× speedup compared to FP32 implementations. The end-to-end performance of FairyFuse is demonstrated with a throughput of 32.4 tokens per second on an Intel Xeon 8558P, outperforming existing systems while maintaining near-lossless quality in model outputs.
Methodology
The authors developed FairyFuse by packing sixteen ternary weights into a single 32-bit word and utilizing AVX-512 masked additions and subtractions to compute the results without any multiplications. They designed a Fused Widely-Linear Kernel that integrates multiple sub-GEMVs into one efficient loop, optimizing memory access and computation through techniques such as mask reuse and register-resident accumulation.
Results
FairyFuse achieved a throughput of 32.4 tokens per second on a single Intel Xeon 8558P socket, outperforming the llama.cpp Q4_K_M by 1.24× while maintaining near-lossless quality (WikiText-2 perplexity of 5.52 compared to 5.47 for FP16). The implementation confirmed zero floating-point multiplication instructions in the inner loop and demonstrated a 29.6× speedup over FP32 kernels.
Implications
The findings suggest that ternary quantization can significantly enhance the efficiency of LLM inference on CPU platforms, making it feasible for deployment in resource-constrained environments such as edge servers and on-device applications. This could lead to broader adoption of LLMs in privacy-sensitive applications where GPU resources are limited.
ARFBench: Benchmarking Time Series Question Answering Ability for Software Incident Response
Time Series
Multimodal
Large Language Models
- ARFBench is the first benchmark specifically designed for evaluating TSQA in software incident response.
- Frontier VLMs outperform existing models, with GPT-5 achieving notable accuracy and F1 scores.
- Hybrid TSFM-VLM models show promise, achieving performance comparable to leading models.
- A model-expert oracle demonstrates complementary strengths, establishing a new superhuman frontier for TSQA.
Read more
ARFBench: Benchmarking Time Series Question Answering Ability for Software Incident Response
Summary
This paper introduces ARFBench, a benchmark designed to evaluate the time series question-answering (TSQA) capabilities of multimodal foundation models (FMs) in the context of software incident response. The benchmark consists of 750 questions derived from 142 time series and 5.38 million data points from 63 production incidents at Datadog. The authors assess various leading proprietary and open-source models, including large language models (LLMs) and vision-language models (VLMs), finding that frontier VLMs significantly outperform existing baselines, with the top model (GPT-5) achieving 62.7% accuracy and 51.9% F1 score. The study also explores a novel hybrid model combining time series foundation models (TSFMs) with VLMs, which shows comparable performance to leading models. Additionally, the authors highlight the complementary strengths of models and human domain experts, proposing a model-expert oracle that achieves 82.8% F1 and 87.2% accuracy, setting a new benchmark for future TSQA models. The benchmark is publicly available for further research.
Methodology
The authors developed ARFBench by generating 750 question-answer pairs from real software incidents, utilizing internal telemetry data. They evaluated various foundation models, including LLMs, VLMs, and TSFMs, and introduced a hybrid modeling approach combining TSFMs with VLMs. The questions were categorized into tiers based on reasoning complexity, and the performance of models was benchmarked against human experts.
Results
The evaluation revealed that frontier VLMs, particularly GPT-5, achieved 62.7% accuracy and 51.9% F1 score, significantly outperforming baseline models. The hybrid TSFM-VLM model achieved comparable performance to leading models. The model-expert oracle achieved an F1 score of 82.8% and an accuracy of 87.2%, indicating a new benchmark for TSQA capabilities.
Implications
The findings suggest that specialized multimodal approaches can enhance the performance of models in time series question answering, particularly in critical domains like software incident response. The benchmark can serve as a foundation for future research and development of more effective TSQA models.
Sink-Token-Aware Pruning for Fine-Grained Video Understanding in Efficient Video LLMs
Computer Vision
Large Language Models
Efficient ML
- Existing visual token pruning methods are inadequate for fine-grained video understanding tasks.
- Sink tokens significantly hinder model performance by distorting visual evidence.
- SToP introduces a sink score to effectively prune semantically uninformative tokens.
- The method shows substantial performance improvements across diverse benchmarks.
Read more
Sink-Token-Aware Pruning for Fine-Grained Video Understanding in Efficient Video LLMs
Summary
This paper addresses the challenge of high inference latency in Video Large Language Models (Video LLMs) caused by the large number of visual tokens processed. Existing training-free visual token pruning methods have shown effectiveness primarily on coarse-grained tasks, such as Multiple-Choice Question Answering (MCQA), but they struggle with fine-grained tasks that require precise visual grounding. The authors identify 'sink tokens'—tokens that attract excessive attention but provide little semantic information—as a significant obstacle to fine-grained understanding. To mitigate this issue, they propose Sink-Token-aware Pruning (SToP), a method that quantifies the sink tendency of tokens and integrates this information into existing pruning techniques. SToP enhances the performance of state-of-the-art pruning methods (VisionZip, FastVid, Holitom) across various benchmarks, including hallucination evaluation and open-ended generation, demonstrating significant performance improvements even with up to 90% of visual tokens pruned. The findings suggest that addressing sink tokens is crucial for maintaining fine-grained visual understanding in Video LLMs.
Methodology
The authors conducted a systematic analysis to identify sink tokens and their impact on model performance. They developed Sink-Token-aware Pruning (SToP), which incorporates a sink score to prioritize the pruning of sink tokens while applying existing spatial and temporal pruning methods. The effectiveness of SToP was validated by integrating it with state-of-the-art pruning techniques and evaluating performance across multiple benchmarks.
Results
The implementation of SToP led to significant performance enhancements in fine-grained video understanding tasks, particularly in hallucination evaluation and open-ended generation. The results indicated that SToP could maintain high accuracy while pruning up to 90% of visual tokens, showcasing its effectiveness compared to traditional pruning methods.
Implications
The findings suggest that improving token pruning strategies by addressing sink tokens can lead to more efficient and effective Video LLMs, making them more suitable for real-world applications that require fine-grained visual understanding. This could enhance various applications in video analysis, content generation, and interactive AI systems.
Domain-Aware Hierarchical Contrastive Learning for Semi-Supervised Generalization Fault Diagnosis
Time Series
- Introduces DAHCL framework to improve fault diagnosis under unseen conditions.
- Addresses pseudo-label bias by incorporating domain-specific geometric characteristics.
- Utilizes uncertain samples effectively through fuzzy contrastive supervision.
- Evaluates performance under realistic noisy conditions, enhancing practical applicability.
Read more
Domain-Aware Hierarchical Contrastive Learning for Semi-Supervised Generalization Fault Diagnosis
Summary
This paper addresses the challenges of fault diagnosis under unseen operating conditions, particularly when labeled data is scarce. The authors propose a novel framework called Domain-Aware Hierarchical Contrastive Learning (DAHCL) for Semi-Supervised Domain Generalization Fault Diagnosis (SSDGFD). The framework tackles two main limitations of existing methods: the generation of biased pseudo-labels due to neglecting domain-specific geometric discrepancies, and the inefficient utilization of unlabeled samples through a rigid accept-or-discard strategy. DAHCL introduces a Domain-Aware Learning (DAL) module that captures the geometric characteristics of source domains to calibrate pseudo-label predictions, thereby reducing cross-domain bias. Additionally, a Hierarchical Contrastive Learning (HCL) module is developed, which employs dynamic confidence stratification and fuzzy contrastive supervision to allow uncertain samples to contribute to representation learning without relying on hard labels. The framework is evaluated under realistic conditions incorporating engineering noise, demonstrating superior robustness and domain generalization capabilities across three benchmark datasets, outperforming existing SSDGFD baselines.
Methodology
The DAHCL framework consists of two main components: the Domain-Aware Learning (DAL) module, which calibrates pseudo-label predictions based on geometric characteristics of source domains, and the Hierarchical Contrastive Learning (HCL) module, which integrates dynamic confidence stratification with fuzzy contrastive supervision to leverage uncertain samples for representation learning.
Results
Extensive experiments conducted on three benchmark datasets show that DAHCL consistently outperforms advanced SSDGFD baselines, exhibiting enhanced robustness and domain generalization capabilities, particularly under severe noise and substantial domain shifts.
Implications
The proposed framework has significant implications for industrial applications where reliable fault diagnosis is critical, especially in environments with varying operating conditions and limited labeled data. It can improve the efficiency and accuracy of diagnostic models in real-world scenarios.
IRIS: Interpolative Rényi Iterative Self-play for Large Language Model Fine-Tuning
NLP
Large Language Models
Optimization
- IRIS provides a unified framework for self-play fine-tuning using Rényi divergence.
- The framework allows for adaptive adjustment of the divergence objective based on training stages.
- Empirical results show significant performance improvements over existing self-play methods.
- IRIS achieves competitive results with fewer annotated samples compared to standard supervised fine-tuning.
Read more
IRIS: Interpolative Rényi Iterative Self-play for Large Language Model Fine-Tuning
Summary
The paper introduces IRIS (Interpolative Rényi Iterative Self-play), a novel framework for fine-tuning large language models (LLMs) that leverages self-play mechanisms to enhance model performance without requiring additional human annotations. Traditional self-play fine-tuning methods often rely on fixed divergence regimes, which can lead to suboptimal learning dynamics at various training stages. IRIS addresses this limitation by employing a Rényi-based approach that allows for a continuously adjustable objective through an order parameter α. This parameter enables the framework to interpolate between different divergence types, such as KL divergence and χ2 divergence, optimizing learning based on the distributional gap between model outputs and target distributions. The authors establish the theoretical foundation of IRIS, demonstrating its fixed-point property and the control of gradient concentration via α. Empirical evaluations on the Zephyr-7B and Qwen2.5-3B models across ten benchmarks reveal that IRIS significantly outperforms existing methods, achieving an average score of 44.57% with only 26k annotated samples, surpassing traditional supervised fine-tuning that utilizes a full dataset of 200k samples.
Methodology
IRIS employs a Rényi-based self-play fine-tuning framework that decomposes the learning objective into two independent tilted risk terms over annotated and synthetic data. The order parameter α is used to control the importance weights, allowing for interpolation among various divergence regimes throughout the training process. An adaptive order schedule adjusts α according to the distributional gap, optimizing learning dynamics at different stages.
Results
The experiments conducted on the Zephyr-7B and Qwen2.5-3B models demonstrated that IRIS achieved an average score of 44.57% across ten benchmarks, outperforming baseline methods. Notably, IRIS with only 26k annotated samples surpassed the performance of standard supervised fine-tuning that utilized the full 200k dataset.
Implications
The findings suggest that IRIS can significantly reduce the need for extensive human-annotated datasets in fine-tuning large language models, making the training process more efficient. This could lead to broader applications of LLMs in various domains where annotated data is scarce or expensive to obtain.
A Hybridizable Neural Time Integrator for Stable Autoregressive Forecasting
Time Series
Theory
Efficient ML
- Introduces a hybrid autoregressive transformer embedded in a mixed finite element framework for stable forecasting.
- Proves preservation of discrete energies and uniform gradient bounds, avoiding the exploding gradient problem.
- Achieves a 65× reduction in model parameters while outperforming state-of-the-art models in chaotic system forecasting.
- Demonstrates a 9,000× speedup in real-time simulations for a fusion component using only 12 training simulations.
Read more
A Hybridizable Neural Time Integrator for Stable Autoregressive Forecasting
Summary
This paper addresses the challenges of stability in autoregressive modeling of chaotic dynamical systems over long time horizons. The authors propose a hybrid method that integrates an autoregressive transformer within a novel shooting-based mixed finite element scheme, which ensures provable stability. The method preserves discrete energies for forward problems and maintains uniform bounds on gradients during training, effectively avoiding the exploding gradient problem. By combining this approach with a vision transformer, the authors achieve structure-preserving dynamics in latent tokens. The proposed model significantly reduces the number of parameters (by 65 times) while outperforming existing foundation models in long-horizon forecasting of chaotic systems. Additionally, a 'mini-foundation' model for a fusion component demonstrates that only 12 simulations are needed to train a real-time surrogate, resulting in a 9,000 times speedup compared to traditional particle-in-cell simulations. This work highlights the potential of incorporating physical structure into machine learning models to enhance stability and performance in scientific applications.
Methodology
The authors embed learned neural dynamics within a mixed finite element framework based on finite element exterior calculus (FEEC). This approach combines elements of geometric integrators and mortar methods to ensure stability and energy preservation. The model utilizes a vision transformer for end-to-end latent dynamics, allowing for efficient learning from sparse datasets.
Results
The proposed method successfully forecasts chaotic systems over 10,000 Lyapunov times, reproducing invariant measures beyond the capabilities of neural ODEs. It matches state-of-the-art accuracy on shear flow benchmarks with significantly fewer parameters and enables real-time design iterations for plasma physics simulations.
Implications
This research has significant implications for scientific modeling and simulation, particularly in fields requiring long-term forecasting of chaotic systems. The ability to achieve stability and efficiency with fewer data points can lead to advancements in real-time simulations and design optimizations in various engineering applications.
Forget, Then Recall: Learnable Compression and Selective Unfolding via Gist Sparse Attention
NLP
Large Language Models
Efficient ML
- Introduces Gist Sparse Attention (GSA) for efficient long-context modeling in LLMs.
- Combines learnable compression with selective unfolding to improve attention mechanisms.
- Achieves significant performance improvements over existing compression and sparse attention methods.
- Enables multi-resolution context access with reduced computational complexity.
Read more
Forget, Then Recall: Learnable Compression and Selective Unfolding via Gist Sparse Attention
Summary
This paper addresses the challenge of scaling large language models (LLMs) to handle long contexts efficiently, which is hindered by the quadratic computational cost of full attention mechanisms. The authors propose a novel approach that combines learnable compression with selective unfolding through a mechanism called Gist Sparse Attention (GSA). The key innovation is the use of interleaved gist compression tokens that summarize sets of raw tokens and serve as routing signals for sparse attention. The GSA process involves compressing the context into gist tokens, selecting the most relevant gists based on attention scores, and restoring the corresponding raw tokens for detailed attention. This coarse-to-fine mechanism allows for efficient access to both global representations and fine-grained details without requiring external retrieval modules. The authors also extend their framework hierarchically, enabling multi-resolution context access with logarithmic decoding complexity. Empirical evaluations on LongBench and RAG benchmarks show that the proposed method outperforms existing compression baselines and inference-time sparse attention methods across various compression ratios.
Methodology
The methodology involves compressing input contexts into interleaved gist tokens, which are then used to select relevant sub-contexts based on attention scores. The selected raw tokens are restored for detailed attention, creating a structured and efficient attention mechanism. The framework is trained end-to-end, avoiding the need for external modules.
Results
The proposed GSA method consistently outperformed other compression baselines and inference-time sparse attention techniques across various compression ratios (8× to 32×) on LongBench and RAG benchmarks, demonstrating improved accuracy and efficiency.
Implications
This work has significant implications for the development of next-generation LLMs, particularly in applications requiring long-context understanding, such as in-depth reasoning, code generation, and multi-turn dialogue systems. The proposed method could lead to more efficient model training and inference, enabling broader applications in real-world scenarios.
An effective variant of the Hartigan $k$-means algorithm
Optimization
Theory
Efficient ML
- Smartigan improves upon Hartigan's k-means algorithm by an additional 2-5%.
- The algorithm encourages exploration of the clustering space, particularly beneficial in high-dimensional settings.
- Smartigan-stability provides a strong guarantee for cluster assignments, enhancing the robustness of the clustering process.
- Empirical results confirm the superiority of Smartigan over traditional Lloyd's and Hartigan's algorithms.
Read more
An effective variant of the Hartigan $k$-means algorithm
Summary
This paper presents a novel variant of Hartigan's k-means algorithm, referred to as 'Smartigan', which aims to improve clustering performance. The authors highlight that while Hartigan's algorithm generally outperforms Lloyd's algorithm by 5-10%, their proposed modification can yield an additional improvement of 2-5%, especially as the dimensionality or the number of clusters increases. The Smartigan algorithm introduces a slight variation in the evaluation order of points, utilizing a random permutation that enhances exploration during the clustering process. The authors provide a detailed description of the algorithm, emphasizing its similarity to Hartigan's method while also incorporating mechanisms that encourage better exploration of the clustering space. Theoretical guarantees are discussed, showing that Smartigan-stability implies Hartigan-stability, which in turn implies Lloyd-stability. The paper concludes with empirical results demonstrating the effectiveness of Smartigan over both Lloyd's and Hartigan's algorithms.
Methodology
The authors propose the Smartigan algorithm, which modifies Hartigan's method by changing the order in which points are evaluated during clustering. This involves a random permutation of points to enhance exploration. The algorithm iteratively assigns points to clusters based on a modified criterion that balances exploration and exploitation, ensuring that cluster assignments are stable and optimal.
Results
The empirical evaluations demonstrate that Smartigan consistently outperforms both Lloyd's and Hartigan's algorithms, particularly in scenarios with higher dimensions or larger numbers of clusters. The improvements are statistically significant, showcasing the effectiveness of the proposed modifications.
Implications
The findings suggest that Smartigan could be widely applicable in various clustering tasks, especially in high-dimensional data scenarios where traditional methods may struggle. This could enhance performance in fields such as image processing, market segmentation, and any domain requiring efficient clustering of large datasets.
Probably Approximately Consensus: On the Learning Theory of Finding Common Ground
Theory
Optimization
- Introduces a formal definition for passive 1D interval-based consensus finding.
- Develops an efficient Empirical Risk Minimization (ERM) algorithm.
- Establishes PAC learning guarantees, including sample complexity bounds.
- Demonstrates the effectiveness of selective querying strategies in reducing query numbers.
Read more
Probably Approximately Consensus: On the Learning Theory of Finding Common Ground
Summary
This paper addresses the challenge of identifying consensus in online deliberation platforms by modeling consensus as an interval in a one-dimensional opinion space. The authors propose a method that incorporates the salience of topics, allowing for a more meaningful representation of collective sentiment. They define an objective that maximizes expected agreement within a hypothesis interval based on an underlying distribution of issues. An efficient Empirical Risk Minimization (ERM) algorithm is introduced, along with PAC-learning guarantees. The methodology focuses on passive learning, where the goal is to identify an interval representing maximum consensus from a sample of user preferences. Initial experiments validate the algorithm's performance and explore strategies for efficiently querying users to optimize consensus identification. The findings suggest that selectively querying users can significantly reduce the number of necessary queries, enhancing the practicality of the approach.
Methodology
The authors model consensus as a one-dimensional interval derived from high-dimensional data through embedding and dimensionality reduction. They employ an Empirical Risk Minimization (ERM) approach to maximize expected agreement within a hypothesis interval, incorporating salience through an underlying distribution of issues. The study focuses on passive learning, analyzing a sample of user preferences to identify the optimal consensus region.
Results
The proposed algorithm shows promising performance in identifying consensus regions, with experimental results indicating that selective querying can significantly reduce the number of queries needed to achieve practical consensus identification. The PAC learning analysis provides theoretical backing for the algorithm's efficiency and effectiveness.
Implications
The findings have potential applications in online deliberation platforms, enhancing the ability to distill areas of agreement from complex discussions. The approach can inform group decision-making processes by identifying more relevant consensus statements based on user preferences and topic salience.
Geometric Characterisation and Structured Trajectory Surrogates for Clinical Dataset Condensation
Theory
Efficient ML
Optimization
- Introduces a geometric characterization of trajectory matching in dataset condensation.
- Identifies a representability bottleneck in traditional trajectory matching methods.
- Proposes Bézier Trajectory Matching (BTM) to improve the efficiency of dataset condensation.
- Demonstrates that BTM outperforms standard trajectory matching in clinical datasets, especially in challenging settings.
Read more
Geometric Characterisation and Structured Trajectory Surrogates for Clinical Dataset Condensation
Summary
This paper addresses the challenge of dataset condensation, particularly in clinical settings where large datasets are difficult to manage and share. The authors provide a geometric characterization of trajectory matching (TM), a common approach for dataset condensation that uses changes in model parameters during training on real data to supervise synthetic data. They identify a limitation in TM, where a fixed synthetic dataset can only reproduce a limited range of training-induced parameter changes, leading to a representability bottleneck when the supervision signal is spectrally broad. To overcome this issue, the authors propose Bézier Trajectory Matching (BTM), which utilizes quadratic Bézier trajectory surrogates instead of traditional stochastic gradient descent (SGD) trajectories. This method optimizes the path between initial and final model states to reduce average loss and align better with the constraints of a fixed synthetic dataset. The experiments conducted on five clinical datasets show that BTM consistently matches or improves upon standard TM, particularly in low-prevalence and low-synthetic-budget scenarios. The findings suggest that structuring the supervision signal is crucial for effective trajectory matching, rather than merely replicating stochastic optimization paths.
Methodology
The authors conducted a geometric analysis of trajectory matching to understand the limitations of traditional methods. They proposed Bézier Trajectory Matching (BTM), which replaces SGD trajectories with optimized quadratic Bézier surrogates. The surrogates are designed to minimize average loss along the trajectory while providing a more structured supervision signal. Experiments were performed on five clinical datasets to evaluate the performance of BTM compared to standard TM.
Results
The results indicate that BTM consistently matches or improves upon the performance of standard trajectory matching across all datasets tested. The most significant improvements were observed in low-prevalence and low-synthetic-budget scenarios, suggesting that BTM is particularly effective in settings where traditional methods struggle.
Implications
The findings of this paper have important implications for the development of efficient machine learning models in healthcare and other governed domains. By improving dataset condensation techniques, the proposed methods can facilitate better model training and research while addressing data sharing and governance challenges.
Generalizing Numerical Reasoning in Table Data through Operation Sketches and Self-Supervised Learning
NLP
Large Language Models
Efficient ML
- TaNOS framework improves numerical reasoning robustness by addressing reasoning inefficiency, data scarcity, and header dependency.
- Operation sketches help models focus on contextual reasoning rather than surface-level arithmetic.
- Self-supervised learning allows for the construction of program-question pairs without manual annotation, enhancing data efficiency.
- Header anonymization reduces reliance on specific lexical cues, promoting better schema generalization.
Read more
Generalizing Numerical Reasoning in Table Data through Operation Sketches and Self-Supervised Learning
Summary
This paper addresses the challenges of numerical reasoning in expert-domain tables, which often show high in-domain accuracy but struggle with domain shifts. The authors introduce TaNOS, a continual pre-training framework designed to enhance the robustness of numerical reasoning by decoupling domain semantics from numerical operation structures. TaNOS consists of three main components: header anonymization to reduce lexical memorization, operation sketches that provide minimal structural cues, and self-supervised learning that generates correctness-guaranteed program-question pairs from tables. By focusing on structural reasoning rather than surface-level patterns, TaNOS significantly improves the transferability of numerical reasoning across different datasets. The framework was tested on an 8B instruction-tuned model, achieving 80.13% execution accuracy on the FinQA benchmark with only 10% of the training data, outperforming a fully supervised fine-tuning baseline and proprietary models like GPT-5. Additionally, TaNOS demonstrated a negligible cross-domain performance gap, indicating its effectiveness in maintaining robust generalization across diverse expert-domain tables.
Methodology
The authors developed TaNOS, which integrates three mechanisms: operation sketches to provide structural cues, self-supervised learning to generate program-question pairs from unlabeled tables, and header anonymization to mitigate header dependency. This approach aims to enhance the model's ability to generalize across different datasets by focusing on structural reasoning rather than lexical associations.
Results
TaNOS achieved 80.13% execution accuracy on the FinQA benchmark with only 10% of the training data, surpassing the fully supervised fine-tuning baseline (73.97%) and proprietary models. In domain-shift experiments, TaNOS maintained a performance gap of less than 2 percentage points, compared to over 10 percentage points for standard supervised fine-tuning.
Implications
The findings suggest that incorporating structural guidance and self-supervised learning can significantly enhance the robustness and transferability of numerical reasoning models in various expert domains. This has potential applications in fields such as finance, engineering, and biology, where accurate numerical reasoning over tabular data is critical.
Low-Rank Adaptation Redux for Large Models
Large Language Models
Optimization
Efficient ML
- LoRA is a leading method for parameter-efficient fine-tuning of large models, significantly reducing computational and memory costs.
- The paper categorizes advancements in LoRA into architectural design, efficient optimization, and diverse applications.
- Classical signal processing tools provide valuable insights for improving LoRA methods and addressing challenges in deep learning.
- The authors advocate for a systematic approach to LoRA design, informed by first-principles guidelines from signal processing.
Read more
Low-Rank Adaptation Redux for Large Models
Summary
This paper revisits Low-Rank Adaptation (LoRA), a prominent method for parameter-efficient fine-tuning (PEFT) of large foundation models, through the lens of signal processing (SP). LoRA allows for the adaptation of billion-parameter networks with minimal computational and memory overhead by introducing small trainable matrices that augment existing model parameters. The authors categorize recent advancements in LoRA into three axes: architectural design, efficient optimization, and applications. They discuss how techniques such as singular value decomposition (SVD), rank-augmentation, and gauge-invariant optimization can enhance LoRA's effectiveness. The paper emphasizes the importance of bridging classical SP tools with modern deep learning challenges, suggesting that SP principles can inform the development of more principled PEFT methods. The authors also outline open research directions that could benefit both the SP and deep learning communities, highlighting the need for a systematic understanding of LoRA's design and application in real-world scenarios.
Methodology
The authors review and categorize LoRA advancements based on architectural designs and optimization techniques, leveraging principles from signal processing such as SVD and matrix decompositions to enhance the understanding and effectiveness of LoRA methods.
Results
The paper does not present empirical results but rather focuses on theoretical insights and categorizations of LoRA advancements, emphasizing the potential for improved fine-tuning methods through the integration of signal processing techniques.
Implications
The findings suggest that integrating signal processing principles with deep learning can lead to more efficient and effective fine-tuning methods for large models, potentially democratizing access to advanced AI capabilities across various domains.
Absorber LLM: Harnessing Causal Synchronization for Test-Time Training
Large Language Models
Efficient ML
NLP
- Absorber LLM preserves causal relationships between historical contexts and future inferences.
- The method optimizes context absorption through self-supervised causal synchronization.
- Absorber LLM outperforms traditional transformers and prior parameter memory methods in both efficiency and accuracy.
- The approach enables scalable inference in real-world applications involving long-context data.
Read more
Absorber LLM: Harnessing Causal Synchronization for Test-Time Training
Summary
The paper introduces Absorber LLM, a novel approach aimed at addressing the high computational costs associated with transformers, particularly when processing long sequences. Traditional transformer models utilize self-attention mechanisms that exhibit quadratic complexity, leading to significant memory consumption during inference. While alternatives like RNNs and SSMs reduce computational overhead, they sacrifice the ability to retain long-tail dependencies. The authors propose a self-supervised causal synchronization method that allows a contextless model to absorb historical contexts into its parameters while preserving the causal relationships necessary for future inferences. This method ensures that the updated model behaves similarly to the original full-context model, thereby enhancing generalization and reducing memory usage. The paper validates the effectiveness of Absorber LLM through experiments on long-context and streaming benchmarks, demonstrating improved accuracy and efficiency compared to existing parameter-as-memory approaches.
Methodology
The authors develop Absorber LLM by formulating a self-supervised optimization objective that synchronizes the internal behaviors of a contextless model with those of a full-context model. This involves training the contextless model to replicate the output of the original model on future generations, thereby ensuring that the absorbed context retains its semantic and causal influence.
Results
Experiments show that Absorber LLM significantly reduces inference memory requirements while achieving higher accuracy on long-context benchmarks compared to traditional transformer models and other parameter memory methods.
Implications
The proposed method has potential applications in scenarios requiring efficient processing of long streams of data, such as conversational AI, real-time data analysis, and continuous learning systems, where maintaining context over extended interactions is crucial.
Graph Neural Network-Informed Predictive Flows for Faster Ford-Fulkerson and PAC-Learnability
Graph Learning
Optimization
Computer Vision
- Integration of GNNs with the Ford-Fulkerson algorithm to improve max-flow computation speed.
- Introduction of a Message Passing Graph Neural Network (MPGNN) that learns edge importance probabilities.
- Development of a modified Ford-Fulkerson procedure that prioritizes high-value augmenting paths.
- Theoretical framework connecting prediction quality to algorithmic efficiency.
Read more
Graph Neural Network-Informed Predictive Flows for Faster Ford-Fulkerson and PAC-Learnability
Summary
This paper presents a novel framework that integrates Graph Neural Networks (GNNs) with the Ford-Fulkerson algorithm to enhance the efficiency of max-flow computations and image segmentation tasks. The authors propose a Message Passing Graph Neural Network (MPGNN) architecture that learns edge importance probabilities, which guide the selection of augmenting paths in the Ford-Fulkerson algorithm. By constructing a grid-based flow network from input images, the MPGNN assigns probabilities to edges based on their likelihood of being part of high-capacity cuts. This information is utilized to prioritize augmenting paths, thereby reducing the number of augmentations needed while maintaining the optimality of the max-flow/min-cut solution. The paper also introduces a theoretical framework relating the quality of predictions to algorithmic efficiency and discusses a hybrid approach that combines flow warm-starting with edge-priority predictions. The results indicate that the proposed method significantly accelerates the Ford-Fulkerson algorithm without compromising its correctness, establishing a foundation for learning-guided combinatorial optimization in image segmentation.
Methodology
The authors developed a Message Passing Graph Neural Network (MPGNN) that learns node and edge embeddings through a mutually dependent update mechanism. They constructed a grid-based flow network from input images, performed GNN inference to assign edge probabilities, and modified the Ford-Fulkerson algorithm to prioritize augmenting paths based on these probabilities. A bidirectional path construction strategy was also introduced, along with a theoretical analysis of prediction quality.
Results
The proposed GNN-informed approach significantly reduced the number of augmentations required in the Ford-Fulkerson algorithm while preserving the optimality of the max-flow/min-cut solution. The method demonstrated improved runtime efficiency in image segmentation tasks, validating the effectiveness of learned predictions in guiding combinatorial optimization.
Implications
This work lays the groundwork for leveraging learning-based methods in combinatorial optimization problems, particularly in applications such as image segmentation. The integration of GNNs can lead to more efficient algorithms in various domains where flow computations are critical.
Validating a Deep Learning Algorithm to Identify Patients with Glaucoma using Systemic Electronic Health Records
Efficient ML
- The study demonstrates the transportability of a pretrained deep learning model for glaucoma risk assessment using EHR data.
- The model achieved an AUROC of 0.883 and a PPV of 0.657, indicating effective identification of glaucoma patients.
- Calibration of the model predictions aligned with clinical outcomes, enhancing its potential for practical application.
- Fine-tuning the model on local data improved its performance, highlighting the importance of adapting models to specific health systems.
Read more
Validating a Deep Learning Algorithm to Identify Patients with Glaucoma using Systemic Electronic Health Records
Summary
This study evaluates the effectiveness of a glaucoma risk assessment (GRA) model trained on the All of Us national dataset in identifying patients at high risk for glaucoma using only systemic electronic health records (EHR) from an independent institution. The cross-sectional study included 20,636 patients from the Stanford Byers Eye Clinic, with 15% diagnosed with glaucoma. The pretrained GRA model was fine-tuned on this cohort and tested using various systemic health data inputs. The best-performing model achieved an area under the receiver operating characteristic curve (AUROC) of 0.883 and a positive predictive value (PPV) of 0.657, indicating a strong ability to identify glaucoma cases. The model's calibration was consistent with clinical risk, showing that the highest prediction decile had a 65.7% glaucoma diagnosis rate. Performance improved with the addition of trainable layers and more data. This EHR-only GRA model could facilitate scalable and accessible pre-screening for glaucoma, potentially improving early detection without the need for specialized imaging.
Methodology
The study utilized a cross-sectional design, analyzing EHR data from 20,636 patients at Stanford. The GRA model, initially trained on the All of Us dataset, was fine-tuned on the Stanford cohort. Systemic health data, including demographics, diagnoses, medications, and lab results, were used as inputs. The model's performance was evaluated using AUROC and PPV metrics, and calibration was assessed against clinical outcomes.
Results
The best model achieved an AUROC of 0.883 and a PPV of 0.657. The highest prediction decile indicated a 65.7% glaucoma diagnosis rate and a 57.0% treatment rate. Performance improved with additional trainable layers and data, demonstrating the model's adaptability and effectiveness in identifying glaucoma risk.
Implications
The findings suggest that an EHR-based glaucoma risk assessment model could enhance early detection and screening efficiency in primary care settings. This approach may reduce the burden of undiagnosed glaucoma and improve patient outcomes by prioritizing those at higher risk for specialized evaluations.
Measure Twice, Click Once: Co-evolving Proposer and Visual Critic via Reinforcement Learning for GUI Grounding
Computer Vision
Reinforcement Learning
Multimodal
- Introduction of a Propose-then-Critic framework for GUI grounding.
- Utilization of a co-evolutionary reinforcement learning strategy to enhance model capabilities.
- Dynamic maturity mechanism to balance prediction accuracy and candidate diversity.
- Significant improvements in grounding accuracy and critic reliability.
Read more
Measure Twice, Click Once: Co-evolving Proposer and Visual Critic via Reinforcement Learning for GUI Grounding
Summary
This paper addresses the challenge of Graphical User Interface (GUI) grounding, which involves mapping natural language instructions to precise pixel coordinates on a screen. Traditional methods often struggle with localization due to visually similar elements and complex layouts. The authors propose a novel Propose-then-Critic framework that replaces static self-consistency strategies with a learnable selection mechanism. This mechanism critiques its own proposals based on visual feedback, allowing for improved localization. The framework employs a co-evolving reinforcement learning paradigm that dynamically balances the training objectives of the proposer and critic, enhancing both the diversity of proposals and the critic's ability to discriminate between them. This mutual reinforcement fosters generalizability across diverse interface layouts. Extensive experiments demonstrate that the proposed method significantly improves grounding accuracy and critic reliability across six benchmarks, achieving up to a 17.2% relative improvement in grounding capability.
Methodology
The authors developed a Propose-then-Critic framework that shifts GUI grounding from a single-pass regression task to a learnable Visual Perception Ranking paradigm. They introduced a co-evolutionary reinforcement learning strategy that adapts the training focus between the proposer and critic, allowing them to mutually enhance each other's capabilities. A maturity-aware mechanism was implemented to guide the learning process from basic localization to expansive spatial exploration.
Results
The proposed method achieved a relative improvement of up to 17.2% in grounding capability compared to existing methods. It demonstrated substantial enhancements in both the generation accuracy of proposals and the reliability of the critic across six benchmark datasets.
Implications
This research has significant implications for the development of autonomous GUI agents, improving their ability to accurately interpret user instructions and interact with complex digital environments. The framework could be applied to various applications in digital automation, enhancing user experience and efficiency.
Fine-Tuning Regimes Define Distinct Continual Learning Problems
Theory
Optimization
- Fine-tuning regimes are crucial evaluation variables in continual learning.
- Changing the trainable depth affects the optimization geometry and update signals.
- Empirical results show that method rankings can vary significantly across different regimes.
- Deeper adaptation regimes correlate with higher forgetting and larger update magnitudes.
Read more
Fine-Tuning Regimes Define Distinct Continual Learning Problems
Summary
This paper investigates the impact of fine-tuning regimes on continual learning (CL) methods, arguing that the choice of trainable parameters significantly influences the evaluation of these methods. The authors formalize adaptation regimes as constrained optimization over fixed parameter subspaces, demonstrating that varying the trainable depth alters the effective update signals for both task fitting and knowledge retention. Through empirical analysis across five benchmark datasets (MNIST, Fashion MNIST, KMNIST, QMNIST, and CIFAR-100) and four CL methods (online EWC, LwF, SI, and GEM), the study reveals that the relative performance of these methods varies significantly with different fine-tuning setups. The findings suggest that deeper adaptation regimes lead to larger update magnitudes and increased forgetting, highlighting the need for regime-aware evaluation protocols in CL research.
Methodology
The authors conducted a systematic empirical study comparing four standard continual learning methods (online EWC, LwF, SI, and GEM) across five trainable depth regimes. They evaluated these methods on five benchmark datasets, analyzing the impact of varying the trainable subspace on the performance and behavior of the algorithms.
Results
The study found that the relative ranking of CL methods is not consistent across different fine-tuning regimes. Specifically, deeper adaptation regimes were associated with larger update magnitudes and higher rates of forgetting, indicating that the choice of trainable depth significantly influences the performance of continual learning algorithms.
Implications
The findings suggest that researchers and practitioners in continual learning should consider the fine-tuning regime as a critical factor in method evaluation. This could lead to more robust conclusions about the effectiveness of different CL methods and inform the design of future algorithms and benchmarks.
Transferable SCF-Acceleration through Solver-Aligned Initialization Learning
Optimization
Efficient ML
Theory
- SAIL improves the quality of initial guesses for SCF solvers by training on solver dynamics.
- The Effective Relative Iteration Count (ERIC) is introduced as a more accurate performance metric.
- SAIL achieves significant reductions in ERIC across various molecular sizes, outperforming previous methods.
- The method extends machine learning SCF acceleration to larger drug-like molecules, enhancing computational efficiency.
Read more
Transferable SCF-Acceleration through Solver-Aligned Initialization Learning
Summary
This paper addresses the challenge of accelerating Kohn-Sham density functional theory (KS-DFT) calculations, which are computationally intensive due to the iterative nature of self-consistent field (SCF) solvers. Traditional machine learning methods for predicting initial guesses from molecular geometries often fail when extrapolating to larger molecules, leading to slower convergence. The authors propose a novel approach called Solver-Aligned Initialization Learning (SAIL), which differentiates through the SCF solver end-to-end, allowing for training on solver dynamics rather than ground-state references. This method resolves the supervision issues that degrade performance in out-of-distribution scenarios. The authors introduce the Effective Relative Iteration Count (ERIC) as a new metric to evaluate performance, showing that SAIL significantly reduces ERIC across various molecular sizes, achieving reductions of 37% (PBE), 33% (SCAN), and 27% (B3LYP) on the QM40 dataset. Furthermore, SAIL demonstrates a 1.25× wall-time speedup on larger drug-like molecules, thus extending the applicability of machine learning in SCF acceleration while maintaining accuracy.
Methodology
The authors developed Solver-Aligned Initialization Learning (SAIL), which involves backpropagating through the SCF algorithm to optimize initial guesses based on solver dynamics rather than traditional ground-state targets. This approach is label-free and relies solely on molecular geometries, allowing for effective training even for larger molecules.
Results
SAIL reduced the Effective Relative Iteration Count (ERIC) by 37% for PBE, 33% for SCAN, and 27% for B3LYP on the QM40 dataset, which includes molecules up to four times larger than those in the training set. Additionally, SAIL provided a 1.25× wall-time speedup for larger drug-like molecules, significantly improving computational efficiency compared to previous state-of-the-art methods.
Implications
The findings suggest that SAIL can be a transformative approach for accelerating SCF calculations in computational chemistry, enabling more efficient simulations of larger and more complex molecular systems. This has potential applications in drug discovery and materials science, where computational resources are critical.
Early Detection of Latent Microstructure Regimes in Limit Order Books
Time Series
Theory
- Introduces a causal regime model for LOBs with identifiable latent build-up phases.
- Derives theoretical guarantees for early detection of stress onset.
- Proposes a novel trigger-based detector with formal foundations.
- Demonstrates superior performance over traditional detection methods in simulations.
Read more
Early Detection of Latent Microstructure Regimes in Limit Order Books
Summary
This paper addresses the challenge of early detection of stress in limit order books (LOBs), which can transition rapidly from stable to stressed conditions. Traditional early-warning signals are reactive, responding only after stress has begun, which limits their effectiveness. The authors propose a three-regime causal data-generating process (DGP) consisting of stable, latent build-up, and stress phases, where the latent build-up phase can be identified under certain conditions. They derive two theoretical guarantees: a sufficient drift-to-noise condition for positive expected lead-time and a lower bound on the probability of detection before stress onset. A novel trigger-based detector is introduced, which combines MAX aggregation of uncertainty and drift channels, a rising-edge condition, and an adaptive threshold. The method was rigorously evaluated through 200 simulation runs, achieving a mean lead-time of +18.6 timesteps, with high precision and moderate coverage. A preliminary application on real BTC/USDT order book data demonstrated a mean lead-time of +38 seconds, indicating the practical applicability of the method. The findings suggest that the proposed approach can effectively identify latent stress signals in LOBs, outperforming existing methods.
Methodology
The authors developed a three-regime causal data-generating process to model the transitions in limit order books. They derived theoretical propositions regarding detection lead-time and detection probability based on drift-to-noise ratios. A trigger-based detector was designed, utilizing MAX aggregation of uncertainty and drift channels, along with an adaptive threshold mechanism. The methodology was validated through extensive simulations and a preliminary real-data application.
Results
The proposed method achieved a mean lead-time of +18.6 timesteps with a precision of 1.00 and coverage of 0.54 in simulations. In a preliminary application on BTC/USDT data, the detector achieved a mean lead-time of +38 seconds with a precision of 1.00 and coverage of 0.80. The analysis of missed detections revealed that they were concentrated in specific parameter settings, consistent with theoretical predictions.
Implications
The findings suggest that the proposed detection framework can significantly enhance the ability to identify latent stress in limit order books, potentially improving trading strategies and risk management in high-frequency trading environments. The theoretical foundations and empirical results pave the way for further research and development of real-time monitoring systems in financial markets.
MCAP: Deployment-Time Layer Profiling for Memory-Constrained LLM Inference
Large Language Models
Efficient ML
NLP
- MCAP enables load-time per-layer precision decisions, enhancing flexibility in model deployment.
- The method significantly increases decode throughput compared to existing systems.
- NVE can run larger models in constrained memory environments without performance loss.
- MCAP provides a single runtime signal that couples precision routing and memory placement decisions.
Read more
MCAP: Deployment-Time Layer Profiling for Memory-Constrained LLM Inference
Summary
The paper presents MCAP (Monte Carlo Activation Profiling), a novel approach for deploying large language models (LLMs) on memory-constrained hardware. Traditional methods of post-training quantization fix precision choices at calibration time, limiting the devices on which a model can run. MCAP shifts this decision to load time, allowing for dynamic per-layer precision adjustments based on a lightweight runtime signal derived from a 60-second profiling process using 12 calibration prompts. This enables a more flexible deployment across heterogeneous hardware. The authors also introduce NVE, a Rust+CUDA inference engine that utilizes the MCAP signal to optimize both precision dispatch and memory residency. The results demonstrate significant improvements in decode throughput and the ability to run larger models in constrained memory environments without observable degradation in performance. The method allows for the deployment of LLMs across previously infeasible memory regimes, enhancing accessibility for various hardware configurations.
Methodology
The authors developed MCAP, a 60-second, gradient-free profiler that generates a per-layer importance signal based on calibration prompts. This signal informs the NVE inference engine to make real-time decisions on layer precision (W4A8 vs. W4A16) and memory residency (GPU, RAM, SSD). The implementation spans multiple architectures and is designed to optimize both throughput and memory usage.
Results
NVE achieved 1.5–1.8× higher decode throughput than existing systems on NVIDIA T4 across various Llama models. It successfully ran Llama-3.2-3B in 2 GB and Llama-3.1-8B in 4 GB of memory, both without observable performance degradation. The method also demonstrated competitive accuracy on benchmark tasks compared to other quantization methods.
Implications
MCAP's approach allows for the deployment of large language models on a wider range of hardware, including consumer GPUs and mobile devices, which were previously limited by memory constraints. This could democratize access to advanced AI models and applications across various industries.
Relocation of compact sets in $ ext{R}^n$ by diffeomorphisms and linear separability of datasets in $ ext{R}^n$
Theory
- Establishes a theory for relocating compact sets in R^n using diffeomorphisms.
- Proves that a differentiable embedding exists to make compact datasets linearly separable in R^(n+1).
- Demonstrates that width-n deep neural networks can achieve linear separability for compact datasets.
- Connects concepts from differential topology with applications in deep learning.
Read more
Relocation of compact sets in $ ext{R}^n$ by diffeomorphisms and linear separability of datasets in $ ext{R}^n$
Summary
This paper explores the relocation of compact sets in n-dimensional manifolds through self-diffeomorphisms, emphasizing its relevance to data classification in data science. The authors establish a theoretical framework that allows for the relocation of a finite number of compact sets in R^n to arbitrary target domains using diffeomorphisms. A significant contribution is the proof that any collection of compact datasets can be made linearly separable through a differentiable embedding into R^(n+1). The paper further demonstrates that deep neural networks (DNNs) with specific activation functions (Leaky-ReLU, ELU, SELU) can achieve linear separability of these datasets under mild conditions. The study connects differential topology with deep learning, providing insights into how neural networks can untangle complex data structures to achieve linear classification.
Methodology
The authors utilize concepts from differential topology to develop a theoretical framework for relocating compact sets via diffeomorphisms. They rigorously prove the existence of differentiable embeddings that enable linear separability of datasets. The application of this theory is demonstrated through the design of deep neural networks with specific activation functions, showing how these networks can transform complex data structures into linearly separable forms.
Results
The main results indicate that for any finite collection of mutually disjoint compact datasets in R^n, there exists a width-(n+1) deep neural network that can make these datasets linearly separable in R^(n+1). The paper also shows that specific activation functions (Leaky-ReLU, ELU, SELU) are effective in achieving this separability under mild conditions.
Implications
The findings have significant implications for data classification in machine learning, particularly in enhancing the capabilities of deep neural networks to handle complex datasets. The theoretical insights may lead to improved methods for transforming non-linearly separable data into forms amenable to linear classification, thereby broadening the applicability of neural networks in various domains.
PrismaDV: Automated Task-Aware Data Unit Test Generation
Theory
Efficient ML
- PrismaDV generates task-aware data unit tests by analyzing downstream task code and dataset profiles.
- The SIFTA framework allows for continuous adaptation of data unit tests based on execution outcomes.
- PrismaDV outperforms existing task-agnostic and task-aware frameworks in generating relevant unit tests.
- The system addresses common shortcomings in data unit testing, such as manual maintenance and partial coverage.
Read more
PrismaDV: Automated Task-Aware Data Unit Test Generation
Summary
PrismaDV is introduced as an innovative AI system designed to enhance data validation processes by generating task-aware data unit tests. Unlike existing frameworks that operate in a task-agnostic manner, PrismaDV analyzes both the downstream task code and dataset profiles to identify data access patterns and infer implicit data assumptions. This allows for the creation of executable data unit tests that are tailored to the specific requirements of the tasks consuming the data. The authors also propose a framework called Selective Informative Feedback for Task Adaptation (SIFTA), which optimizes prompts based on the outcomes of data unit tests and downstream tasks, enabling continuous adaptation of the tests to evolving datasets. The evaluation of PrismaDV across two new benchmarks, encompassing 60 tasks across five datasets, demonstrates its superiority over both task-agnostic and task-aware baselines in generating relevant unit tests that effectively capture the end-to-end impact of data errors. Furthermore, SIFTA is shown to automatically learn prompts that outperform manually crafted ones, highlighting the system's potential for improving data validation in production environments.
Methodology
PrismaDV employs a compound AI system that integrates static code analysis and dataset profiling to identify data access patterns and infer implicit assumptions. It generates executable data unit tests tailored to specific downstream tasks. The SIFTA framework optimizes prompts based on the results of executed tests and tasks, enhancing the adaptability of the generated tests.
Results
The evaluation of PrismaDV on two benchmarks with 60 tasks across five datasets showed consistent performance improvements over existing frameworks, effectively generating unit tests that reflect the impact of data errors. The SIFTA framework demonstrated the ability to learn effective prompts that surpassed those created manually or by generic optimizers.
Implications
PrismaDV has significant implications for improving data validation processes in modern enterprises, potentially reducing the risk of data errors propagating through systems and enhancing the reliability of downstream applications. Its automated and task-aware approach could streamline data engineering workflows and reduce the manual burden on data teams.
A Scale-Adaptive Framework for Joint Spatiotemporal Super-Resolution with Diffusion Models
Generative Models
Time Series
Computer Vision
- Introduces a scale-adaptive framework for joint spatiotemporal super-resolution using diffusion models.
- Decomposes spatiotemporal SR into deterministic and stochastic components, enhancing model flexibility.
- Requires only three retuned hyperparameters for adaptation across different SR factors.
- Demonstrated effectiveness on precipitation data, supporting applications in climate science.
Read more
A Scale-Adaptive Framework for Joint Spatiotemporal Super-Resolution with Diffusion Models
Summary
This paper presents a novel scale-adaptive framework for joint spatiotemporal super-resolution (SR) using diffusion models, addressing the limitations of existing models that are typically designed for fixed spatial and temporal SR factors. The authors propose a method that decomposes spatiotemporal SR into a deterministic prediction of the conditional mean and a residual conditional diffusion model, with an optional mass-conservation transform to maintain total precipitation amounts. The framework allows for the reuse of the same architecture across various SR factors by retuning only three hyperparameters before retraining, thus enabling efficient adaptation to different datasets without the need for redesigning the model architecture. The proposed method is validated on reanalysis precipitation data over France, demonstrating its capability to span super-resolution factors from 1 to 25 in space and 1 to 6 in time, yielding realistic ensembles that support climate-impact applications such as downscaling. This work significantly advances the application of deep learning in climate science, particularly in the context of precipitation data, which is characterized by high intermittency and complex multiscale interactions.
Methodology
The authors decompose the spatiotemporal super-resolution task into a deterministic prediction of the conditional mean and a residual diffusion model. They utilize three hyperparameters—diffusion noise schedule amplitude, temporal context length, and an optional mass-conservation function—to adapt the model for different spatial and temporal SR factors. The model is trained separately for each factor pair while sharing the same architecture.
Results
The proposed framework successfully spans super-resolution factors from 1 to 25 in space and 1 to 6 in time, producing realistic precipitation ensembles. The model demonstrates improved adaptability and performance in generating high-resolution precipitation data compared to traditional methods.
Implications
This framework has significant implications for climate science, particularly in enhancing the accuracy of precipitation forecasts and supporting various applications such as flood forecasting and climate impact assessments. It allows for efficient model reuse across different datasets, facilitating broader adoption of deep learning techniques in environmental research.
JEPAMatch: Geometric Representation Shaping for Semi-Supervised Learning
Computer Vision
Theory
Efficient ML
- JEPAMatch addresses class imbalance and slow convergence issues in semi-supervised learning.
- The method integrates geometric representation shaping inspired by LeJEPA into the training process.
- Extensive experiments show significant performance improvements over existing methods.
- The approach can be adapted to various FixMatch variants, enhancing its applicability.
Read more
JEPAMatch: Geometric Representation Shaping for Semi-Supervised Learning
Summary
The paper introduces JEPAMatch, a novel approach to semi-supervised learning (SSL) that addresses the limitations of existing methods, particularly those derived from FixMatch. While FixMatch has shown strong performance in image classification by combining weak and strong data augmentations with confidence-based pseudo-labeling, it suffers from issues such as class imbalance and slow convergence due to reliance on fixed confidence thresholds. JEPAMatch shifts the focus from conventional output thresholds to the geometric shaping of representations in latent space, inspired by the Latent-Euclidean Joint-Embedding Predictive Architectures (LeJEPA). The authors propose a new training objective that integrates a semi-supervised loss with a latent-space regularization term, promoting well-structured representations while maintaining the benefits of pseudo-labeling. Extensive experiments on CIFAR-100, STL-10, and Tiny-ImageNet demonstrate that JEPAMatch consistently outperforms existing baselines, accelerates convergence, and reduces computational costs compared to standard FixMatch-based methods.
Methodology
JEPAMatch combines a classical semi-supervised loss with a latent-space regularization term derived from the LeJEPA framework. The learning process is divided into two levels: a Curriculum Level for pseudo-label selection and a Representation Level for structuring the feature space. This dual-level optimization aims to improve classification performance and convergence speed.
Results
The proposed JEPAMatch method consistently outperformed existing baseline methods across multiple datasets, including CIFAR-100, STL-10, and Tiny-ImageNet. It demonstrated significantly faster convergence rates and reduced computational costs compared to traditional FixMatch-based approaches.
Implications
JEPAMatch has the potential to enhance semi-supervised learning applications in various domains, particularly where labeled data is scarce. Its ability to improve representation quality and convergence dynamics could lead to more efficient training processes in real-world scenarios.