AI-generated summaries

Today's ML research,
without the noise.

Daily summaries of the latest machine learning papers from arXiv, processed every 8 hours.

45 Papers today
8h Update frequency
7 Days of history
Catching a Moving Subspace: Low-Rank Bandits Beyond Stationarity
Hamed Khosravi, Xiaoming Huo
Theory
  • Introduces a framework for piecewise-stationary low-rank linear contextual bandits.
  • Establishes an identification boundary for recovering moving subspaces under scalar feedback.
  • Develops the SPSC algorithm that interleaves probing and exploitation to adapt to subspace changes.
  • Demonstrates significant performance improvements over existing methods in empirical evaluations.
Read more
Spectral Souping: A Unified Framework for Online Preference Alignment
Yinlam Chow, Guy Tennenholtz, Ted Yun, James Harrison, Arthur Gretton, Andre Barreto, Bo Dai
NLP Large Language Models Reinforcement Learning
  • Introduction of Spectral Souping for online preference alignment in LLMs.
  • Discovery of a universal spectral representation that aids in model merging.
  • Two-phase methodology: offline training of specialized policies and online adaptation.
  • Significant performance improvements over existing methods.
Read more
FedCoE: Bridging Generalization and Personalization via Federated Coordinated Dual-level MoEs
Penglin Dai, Fulian Li, Xincao Xu, Junhua Wang, Lixin Duan, Xiao Wu
Federated Learning
  • FedCoE balances global generalization and local personalization in federated learning.
  • The framework utilizes a dual-level mixture-of-experts architecture to handle heterogeneous data.
  • A shared gating network synchronizes expert selection across clients, addressing gating inconsistency.
  • An adaptive mechanism allows new clients to quickly access global experts, improving cold-start performance.
Read more
Plug-and-Play Spiking Operators: Breaking the Nonlinearity Bottleneck in Spiking Transformers
Xinzhe Yuan, Xiang Peng, Bin Gu, Huan Xiong
NLP Large Language Models Efficient ML
  • Introduces a plug-and-play framework for spiking operators in Transformers.
  • Decomposes nonlinear computations into spike-friendly primitives.
  • Supports common Transformer nonlinearities without fine-tuning.
  • Demonstrates minimal accuracy loss (<1%) across various tasks.
Read more
Weasel: Out-of-Domain Generalization for Web Agents via Importance-Diversity Data Selection
Fatemeh Pesaran zadeh, Seyeon Choi, Xing Han LΓΉ, Siva Reddy, Gunhee Kim
NLP Large Language Models Efficient ML
  • WEASEL is the first data selection approach designed for offline web agent training, focusing on out-of-domain generalization and training efficiency.
  • The method employs a greedy algorithm to optimize trajectory selection based on importance and diversity.
  • Target-centered AXTree pruning is introduced to enhance training efficiency by removing irrelevant content.
  • The approach includes generating style-consistent reasoning traces to improve performance in reasoning-native models.
Read more
OmniISR: A Unified Framework for Centralized and Federated Learning via Intermediate Supervision and Regularization
Wei-Bin Kou, Guangxu Zhu, Ming Tang, Chen Zhang, Lisheng Wu, Lei Zhou, Yujiu Yang
Federated Learning Theory Optimization
  • Introduces a unified framework that integrates centralized and federated learning.
  • Utilizes intermediate supervision and regularization to address optimization challenges.
  • Provides theoretical guarantees for convergence and gradient alignment.
  • Demonstrates significant performance improvements in both CL and FL settings.
Read more
Behavior-Consistent Deep Reinforcement Learning
Marcel Hussing, Liv G. d'Aliberti, Claas Voelcker, Benjamin Eysenbach, Eric Eaton
Reinforcement Learning Robotics Theory
  • Introduction of behavior-consistent reinforcement learning (BRL) as a new framework.
  • Establishment of a theoretical link between policy divergence and Q-function disagreement.
  • Identification of challenges in high-entropy maximum-entropy RL.
  • Development of Q-value Expectile Disagreement (QED) for improved behavioral consistency.
Read more
CAdam: Context-Adaptive Moment Estimation for 3D Gaussian Densification in Generative Distillation
SeungJeh Chung, Geonho Park, Misong Kim, HyeongYeop Kang
Generative Models Optimization Computer Vision
  • CAdam reduces Gaussian counts by 85%–97% compared to standard densification methods.
  • The framework addresses the Densification Dilemma by leveraging statistical signal verification.
  • CAdam employs a novel approach that combines momentum-based verification and context-aware selection.
  • The method maintains comparable perceptual quality while improving memory efficiency.
Read more
Provably Learning Diffusion Models under the Manifold Hypothesis: Collapse and Refine
Wei Huang, Andi Han, Mingyuan Bai, Huanjian Zhou, Qixin Zhang, Taiji Suzuki, Kenji Fukumizu
Generative Models Theory Efficient ML
  • Introduces Score-induced Latent Diffusion (SiLD) as a two-stage learning framework for diffusion models.
  • Proves convergence guarantees and establishes that sample complexity depends on intrinsic dimension, not ambient dimension.
  • Demonstrates empirical success on various datasets, outperforming VAE-based latent diffusion models.
  • Establishes a novel training strategy that integrates manifold learning and density estimation under a single objective.
Read more
Gaussian Sheaf Neural Networks
AndrΓ© Ribeiro, Ana Luiza TenΓ³rio, Tiago da Silva, Diego Mesquita
Graph Learning Theory
  • Introduction of Gaussian Sheaf Neural Networks (GSNNs) for learning with Gaussian-distributed node features.
  • Development of a new Laplacian operator that generalizes the sheaf Laplacian for Gaussian distributions.
  • GSNNs demonstrate superior performance compared to traditional GNNs on both synthetic and real-world datasets.
  • The framework effectively preserves the geometric and algebraic structure of Gaussian parameters during message passing.
Read more
Fast Reconstruction of Exact Maxwell Dynamics from Sparse Data
Dan DeGenaro, Xin Li, Obed Amo, Michael Pokojovy, Sarah Adel Bargal, Markus Lange-Hegermann, Bogdan RaiΕ£Δƒ
Theory Efficient ML Optimization
  • Introduction of FLASH-MAX, a shallow neural network that predicts electromagnetic fields from sparse data.
  • Each hidden neuron in FLASH-MAX represents an exact solution to Maxwell's equations, ensuring physical validity by construction.
  • Achieves sub-1% relative validation error from about 1,000 observations in seconds, with zero PDE residual.
  • Demonstrates that embedding governing structures into the model improves the trade-off between accuracy and optimization speed.
Read more
Same Target, Different Basins: Hard vs. Soft Labels for Annotator Distributions
Mirerfan Gheibi, Gashin Ghazizadeh
Computer Vision Theory Optimization
  • Hard-label delivery methods can improve learning outcomes when annotations are sparse.
  • Multipass and SLS methods match soft-label training when full annotator distributions are available.
  • The preservation of the example-to-distribution match is crucial for effective learning.
  • SLS and soft-label cross-entropy optimize the same expected objective, allowing for clearer comparisons.
Read more
Nonparametric Learning and Earning with One-Point Feedback under Nonstationarity
Xiangyu Yang, Feng Xu, Jian-Qiang Hu, Jiaqiao Hu
Optimization Theory
  • Proposes a nonparametric learning framework for dynamic pricing under nonstationarity.
  • Utilizes one-point feedback for revenue-based gradient approximations.
  • Incorporates a restarting mechanism to adapt to changing market conditions.
  • Introduces a meta-learning layer to handle unknown nonstationarity levels.
Read more
Q-SYNTH: Hybrid Quantum-Classical Adversarial Augmentation for Imbalanced Fraud Detection
Adam Innan, Mansour El Alami, Nouhaila Innan, Muhammad Shafique, Mohamed Bennai
Generative Models
  • Q-SYNTH is a hybrid quantum-classical framework for fraud detection.
  • It synthesizes minority-class fraud samples to address class imbalance.
  • The framework shows improved statistical fidelity and competitive downstream performance.
  • Q-SYNTH offers a favorable trade-off between distributional fidelity and detection performance.
Read more
Smaller Abstract State Spaces Enable Cross-Scale Generalization in Reinforcement Learning
Nasehatul Mustakim, Lucas Lehnert
Reinforcement Learning Theory Efficient ML
  • Introduces a theoretical model for OOD generalization in RL agents using POMDPs.
  • Extends state abstraction frameworks to POMDPs and proposes a novel successor-weighted model reduction.
  • Derives a performance loss bound that highlights the relationship between abstract state space size and OOD generalization.
  • Demonstrates that smaller abstract state spaces improve test performance and facilitate generalization to complex tasks.
Read more
Modular Multimodal Classification Without Fine-Tuning: A Simple Compositional Approach
Herman BergstrΓΆm, Aditya Mehrotra, Rahul G. Krishnan
Multimodal
  • CoMET framework allows for multimodal classification without fine-tuning.
  • PCA is sufficient for effective dimensionality reduction in embeddings.
  • PALPooling improves representation quality without backpropagation.
  • Achieves state-of-the-art results across various multimodal benchmarks.
Read more
Ada2MS: A Hybrid Optimization Algorithm Based on Exponential Mixing of Elementwise and Global Second-Moment Estimates
Meng Zhu, Quan Xiao, Weidong Min
Optimization
  • Ada2MS combines the advantages of AdamW and Momentum SGD to improve optimization performance.
  • The algorithm utilizes exponential interpolation between elementwise and global second-moment estimates.
  • Ada2MS maintains stability while gradually introducing SGD-like characteristics during training.
  • Experimental results show that Ada2MS performs competitively in visual tasks.
Read more
AGPO: Adaptive Group Policy Optimization with Dual Statistical Feedback
Miaobo Hu, Shuhao Hu, Bokun Wang, Ruohan Wang, Xin Wang, Xiaobo Guo, Daren Zha, Jun Xiao
Reinforcement Learning Large Language Models Optimization
  • AGPO introduces adaptive clipping and temperature sampling to improve training stability and efficiency.
  • The method utilizes group-level statistics to control update magnitude and exploration dynamically.
  • AGPO outperforms traditional PPO and GRPO methods on multiple benchmarks, demonstrating its effectiveness.
  • The approach is critic-free, simplifying the training process while maintaining performance.
Read more
A New Framework to Analyse the Distributional Robustness of Deep Neural Networks
Divij Khaitan, Subhashis Banerjee
Theory Interpretability Computer Vision
  • Introduces a framework for analyzing distributional robustness in deep neural networks.
  • Uses Bernoulli distributions to model interactions between layer weights and activations.
  • Demonstrates the ability to distinguish between memorization and generalization in neural networks.
  • Shows that distribution shifts negatively impact the separation metrics used for robustness diagnostics.
Read more
Robust Recommendation from Noisy Implicit Feedback: A GMM-Weighted Bayes-label Transition Matrix Framework
Zongyu Li, Xuanyu Liu, Gongce Cao, Shirui Sun, Yaqi Fang, Yongshuai Yu
Theory Optimization
  • Introduction of RGBT framework that combines GMM with BLTM for robust recommendations.
  • Theoretical guarantees of full sample utilization and low-variance estimation.
  • Demonstrated effectiveness of RGBT in utilizing noisy samples compared to traditional methods.
  • Superior calibration capability of transition matrix over state-of-the-art approaches.
Read more
Fast and Stable Triangular Inversion for Delta-Rule Linear Transformers
Aleksandros Sobczyk, Gioele Gottardo, Christos K. Matzoros, Mirko De Vita, Filip Skogh, Anastasios Zouzias, Jiawei Zhuang
NLP Large Language Models Efficient ML
  • Introduces a systematic analysis of triangular matrix inversion methods for Delta-Rule Linear Transformers.
  • Highlights the importance of numerical stability in maintaining model accuracy during matrix inversion.
  • Demonstrates significant performance improvements with up to 4.3Γ— speed-up on NPUs compared to existing methods.
  • Focuses on leveraging hardware efficiency through matrix product-rich algorithms.
Read more
Weight Decay Regimes in Grokking Transformers: Cheap Online Diagnostics
Lucky Verma
Theory
  • Weight decay is a critical parameter influencing the transition between memorization, generalization, and collapse in transformers.
  • Two online diagnostics are introduced to track training dynamics effectively and at lower computational costs.
  • The study identifies a critical weight decay threshold (Ξ»c = 0.0158) and an empirical power-law exponent for time-to-grok.
  • The findings are consistent across various model architectures, suggesting broader applicability beyond transformers.
Read more
GraphDiffMed: Knowledge-Constrained Differential Attention with Pharmacological Graph Priors for Medication Recommendation
Krati Saxena, Tomohiro Shibata
Graph Learning Time Series Interpretability
  • Introduction of the first dual-scale application of Differential Attention v2 for medication recommendation.
  • Demonstrated improvements in recommendation quality and safety performance over existing methods.
  • Provided a transparent analysis of the impact of knowledge constraints on safety-performance balance.
  • Showed that higher DDI rates in recommendations can reflect more comprehensive solutions for complex cases.
Read more
Divide and Contrast: Learning Robust Temporal Features without Augmentation
Abdul-Kazeem Shamba, Kerstin Bach, Gavin Taylor
Time Series
  • Di-COT eliminates the need for data augmentation and multiple encoder passes, reducing computational overhead.
  • The method contrasts overlapping sub-blocks within time-series instances, ensuring meaningful representation learning.
  • Di-COT reformulates temporal contrastive learning as a cross-entropy classification task for dense supervision.
  • The framework achieves state-of-the-art performance on multiple benchmarks while maintaining low training times.
Read more
Less Data, Faster Training: repeating smaller datasets speeds up learning via sampling biases
Jingwen Liu, Ezra Edelman, Surbhi Goel, Bingbin Liu
Optimization Theory Efficient ML
  • The small-vs-large gap exists across various tasks and architectures, indicating that fewer samples can lead to faster learning.
  • Sampling biases from smaller datasets enhance optimization by modulating layer-wise updates, facilitating quicker convergence.
  • Empirical evidence shows that even random labels can yield speedups similar to those with real labels, underscoring the role of sampling bias.
  • Adjustments to initialization and learning rates can significantly reduce the small-vs-large gap, highlighting the importance of parameter-wise interventions.
Read more
Robust Personalized Recommendation under Hidden Confounding in MNAR
Zongyu Li, Wanting Su, Tianyu Xia
Theory Optimization
  • Introduces a novel framework (PUID) for personalized estimation of hidden confounding strength in recommender systems.
  • Develops an entropy-based sensitivity estimator to quantify the influence of unobserved confounders.
  • Proposes a benchmark-guided variant (BPUID) that enhances robustness and predictive accuracy.
  • Demonstrates significant performance improvements over global methods in extensive experiments on real-world datasets.
Read more
Decomposing MXFP4 quantization error for LLM reinforcement learning: reducible bias, recoverable deadzone, and an irreducible floor
Xiaocan Li, Shiliang Wu, Zheng Shen
Reinforcement Learning Large Language Models Efficient ML
  • Introduces a structural decomposition of MXFP4 quantization error into three components: scale bias, deadzone truncation, and grid noise.
  • Demonstrates that each error component corresponds to specific RL failure modes affecting training outcomes.
  • Proposes targeted corrections for each failure mode, improving the accuracy of RL post-training.
  • Empirical results show significant recovery of accuracy in large language models post-quantization.
Read more
Learning to Think in Physics: Breaking Shortcut Learning in Scientific Diffusion via Representation Alignment
Haozhe Jia, Pengyu Yin, Wenshuo Chen, Shaofeng Liang, Lei Wang, Bowen Tian, Xiucheng Wang, Nanqian Jia, Yutao Yue
Generative Models
  • Introduction of REPA-P, a teacher-free framework for aligning intermediate representations with physical states.
  • Demonstration of improved convergence and reduced physics residuals across multiple PDE tasks.
  • Validation of the hypothesis that aligning latent features with physical quantities enhances model robustness.
  • Architecture-agnostic approach applicable to both U-Net and Diffusion Transformer backbones.
Read more
Towards Understanding Self-Pretraining for Sequence Classification
Omar Coser, Loredana Zollo, Paolo Soda, Antonio Orvieto
Theory Optimization
  • Self-pretraining (SPT) significantly enhances Transformer model performance in sequence classification tasks.
  • The ability of label supervision to learn useful Attention patterns from random initialization is a central challenge.
  • Learning proximity interactions are identified as a key source of improvements from SPT.
  • SPT gains persist across different model depths, data sources, and pretraining durations.
Read more
CRAFT: Conflict-Resolved Aggregation for Federated Training
Ziqi Wang, Qiang Liu, Nils Thuerey
Federated Learning
  • CRAFT reformulates federated aggregation as a constrained least-squares problem to ensure conflict-free updates.
  • The method employs a momentum-like reference direction to preserve useful temporal information during aggregation.
  • Layer-wise adaptation allows for conflict resolution at varying granularities, making it suitable for deep neural networks.
  • Extensive experiments demonstrate improved mean accuracy and reduced accuracy disparity across clients.
Read more
Dynamic Shapley Computation
Xuan Yang, Hsi-Wen Chen, Ming-Syan Chen, Jian Pei
Theory Efficient ML Interpretability
  • D-Shap transforms Shapley computation into a reusable and incremental process.
  • The framework allows for efficient updates in dynamic settings, addressing both task and player changes.
  • Self-valuation enables the construction of the initial Shapley matrix directly from training data.
  • D-Shap achieves substantial reductions in computational costs, making it practical for real-world applications.
Read more
OpenSeisML: Open Large-Scale Real Seismic and well-log Dataset for Generative AI
Ipsita Bhar, Huseyin Tuna Erdinc, Thales Souza, Charles Jones, Felix J. Herrmann
Generative Models
  • Introduction of OpenSeisML, a large-scale dataset for seismic inversion.
  • Automated data curation pipeline enhances reproducibility and efficiency.
  • Dataset includes real seismic and well-log data, addressing the scarcity of high-quality datasets.
  • Supports training of generative models for uncertainty quantification in subsurface properties.
Read more
TimeSRL: Generalizable Time-Series Behavioral Modeling via Semantic RL-Tuned LLMs -- A Case Study in Mental Health
Yuang Fan, Lilin Xu, Millie Wu, Jingping Nie, Qingyu Chen, Yuzhe Yang, Zhuo Zhang, Xin Liu, Subigya Nepal, Xiaofan Jiang, Xuhai 'Orson' Xu
Time Series Reinforcement Learning Large Language Models
  • TimeSRL introduces a two-stage framework for time-series behavioral modeling that enhances generalizability.
  • The model uses semantic abstractions to improve reasoning over longitudinal behavioral data.
  • TimeSRL achieves state-of-the-art performance in mental health prediction, outperforming traditional ML and LLM methods.
  • The approach demonstrates robustness against distribution shifts across different datasets.
Read more
LOSCAR-SGD: Local SGD with Communication-Computation Overlap and Delay-Corrected Sparse Model Averaging
Yassine Maziane, Ammar Mahran, Artavazd Maranjyan, Peter RichtΓ‘rik
Optimization Federated Learning Efficient ML
  • LOSCAR-SGD combines local training, sparse model averaging, communication-computation overlap, and worker-specific local-step counts.
  • The delay-corrected merge rule preserves local progress during communication delays.
  • Theoretical guarantees are provided for convergence in smooth non-convex settings.
  • Empirical results show significant reductions in training time with the proposed method.
Read more
Latent Process Generator Matching
Lukas Billera, Hedwig Nora Nordlinder, Ben Murrell
Generative Models Theory Optimization
  • Introduces latent process generator matching, extending generator matching theory to time-dependent latent processes.
  • Allows for learning generators of stochastic processes that match one-time marginal distributions on the image space.
  • Generalizes existing methods by accommodating a wider variety of latent spaces, including continuous and manifold-valued processes.
  • Provides sufficient conditions for valid loss functions, recovering results from previous works as corollaries.
Read more
DASH: Fast Differentiable Architecture Search for Hybrid Attention in Minutes on a Single GPU
Weizhe Chen, Miao Zhang, Junpeng Jiang, Yaping Li, Weili Guan, Liqiang Nie
NLP Large Language Models Efficient ML
  • DASH provides a differentiable search framework for hybrid attention architecture design, moving beyond manual and selector-style methods.
  • The framework allows for architecture-only optimization, significantly reducing search time and token usage.
  • DASH consistently outperforms existing hybrid attention design baselines and achieves better performance than Jet-Nemotron models.
  • The method demonstrates that high-quality hybrid architectures can be obtained quickly, paving the way for routine design applications.
Read more
Robust Subspace-Constrained Quadratic Models for Low-Dimensional Structure Learning
Zheng Zhai, Xiaohui Li
Optimization Theory
  • Introduction of a robust SCQM that accommodates various noise distributions.
  • Development of a gradient descent algorithm with orthogonality-preserving updates.
  • Theoretical analysis showing improved robustness with β„“p loss functions.
  • Extensive experiments confirming superior performance over traditional methods.
Read more
A Deployment Audit of Release-Side Risk in Conformal Triage under Prevalence Shift
Chengze Li, Xiao Liu, Hanrong Zhang, Haiyang Peng, Yanghao Ruan, Huanhuan Ma, Chunyu Miao, Qichao Zhou, Xiangrong Qi, Philip Yu
Theory
  • Introduces a leakage-aware deployment audit for evaluating release-side risk in conformal triage.
  • Demonstrates that traditional metrics can obscure the safety of release decisions under prevalence shift.
  • Identifies the necessity of separating correction, calibration, and evaluation to ensure safety in deployment.
  • Shows that lower review rates can lead to the unsafe release of patients who should not be cleared.
Read more
Axiomatizing Neural Networks via Pursuit of Subspaces
Mehmet Yamac, Mert Duman, Ugur Akpinar, Felix Rojas Casadiego, Serkan Kiranyaz, Marcel van Gerven, Moncef Gabbouj
Theory Interpretability
  • Introduces the Pursuit of Subspaces (PoS) framework as an axiomatic approach to understanding neural networks.
  • Establishes four geometric axioms that explain how DNNs learn compact representations.
  • Provides a unified interpretation of architectural mechanisms and their roles in representation and generalization.
  • Connects existing neural architectures to a geometric foundation, facilitating the design of explainable models.
Read more
TriForces: Augmenting Atomistic GNNs for Transferable Representations
Ali Ramlaoui, Alexandre Duval, Hannah Bull, Victor Schmidt, Hugues Talbot, Fragkiskos D. Malliaros, Joseph Musielewicz
Graph Learning
  • TriForces introduces a three-stream architecture for atomistic GNNs, enhancing representation transferability.
  • The framework utilizes self-supervised learning to improve the organization and quality of learned representations.
  • Significant performance improvements were observed on multiple benchmarks without the need for Density Functional Theory (DFT) labels.
  • The model enables efficient similarity retrieval in compositional, structural, or joint embedding spaces.
Read more
Compositional Transduction with Latent Analogies for Offline Goal-Conditioned Reinforcement Learning
Junseok Kim, Dohyeong Kim, Mineui Hong, Songhwai Oh
Reinforcement Learning Robotics Theory
  • Introduces analogy transduction for synthesizing goal-reaching behaviors across varying contexts.
  • Proposes a novel task-endogenous analogy representation that captures essential changes for optimal execution.
  • Develops the Compositional Transduction with latent Analogies (CTA) approach for offline GCRL.
  • Demonstrates significant performance improvements over existing methods in empirical evaluations.
Read more
Beyond Numerical Features: CNN-Driven Algorithm Selection via Contour Plots for Continuous Black-Box Optimization
Yiliang Yuan, Xiang Shi, Mustafa Misir
Optimization
  • Introduces a probing-based AAS formulation using contour maps for continuous BBO.
  • Demonstrates the effectiveness of CNNs in predicting solver performance from visual representations.
  • Shows significant performance improvements over traditional single best solver approaches.
  • Competes well with feature-based methods like ELA and Deep-ELA.
Read more
Most Transformer Modifications Still Do Not Transfer at 1-3B: A 2020-2026 Update to Narang et al. (2021) with Downstream Evaluation and a Noise Floor
Yang Zhao, Jiahao Lu, Bin Huang, Guhua Zhang, Jie Zhou
Large Language Models NLP Theory
  • Most Transformer modifications do not transfer effectively at larger scales (1-3B parameters).
  • Only two out of 20 modifications showed significant improvements at 1.2B, with one failing at 3B.
  • Downstream evaluation metrics are more reliable than pretraining perplexity for assessing model performance.
  • The gap between validation loss and downstream task accuracy has increased for attention-output modifications.
Read more
Automated Kernel Discovery Towards Understanding High-dimensional Bayesian Optimization
Taeyoung Yun, Woocheol Shin, Inhyuck Song, Jaewoo Lee, Jinkyoo Park
Optimization
  • Introduces Kernel Discovery, an LLM-driven framework for high-dimensional BO.
  • Employs a two-stage approach for kernel generation and validation.
  • Proposes LOO-CRPS as a robust selection criterion to avoid overfitting.
  • Achieves superior performance on high-dimensional BO benchmarks compared to existing methods.
Read more
Chronicle: A Multimodal Foundation Model for Joint Language and Time Series Understanding
Paul Quinlan, Jeremy Levasseur, Qingguo Li, Xiaodan Zhu
Multimodal NLP Time Series
  • Chronicle is the first model to jointly pretrain on text and time series from scratch.
  • It utilizes a shared transformer architecture for both modalities, enhancing cross-domain representation learning.
  • Chronicle achieves competitive performance against state-of-the-art unimodal foundation models.
  • The model sets new benchmarks for frozen-embedding time series classification and multimodal forecasting.
Read more