AI-generated summaries

Today's ML research,
without the noise.

Daily summaries of the latest machine learning papers from arXiv, processed every 8 hours.

48 Papers today
8h Update frequency
7 Days of history
Beyond LoRA: Is Sparsity-Induced Adaptation Better?
Elijah Cadenhead, Cristian McGee, Xin Li, El Houcine Bergou, Aritra Dutta
Efficient ML Theory Optimization
  • Introduction of sparsity-induced adaptations (cLA and c3LA) to enhance LoRA variants.
  • Derivation of generalization error bounds for the proposed methods.
  • Empirical evaluation shows significant reductions in training time and memory usage.
  • Sparsity in LoRA adaptations maintains competitive performance across various tasks.
Read more
Controlled Dynamics Attractor Transformer
Cheng Zhang, Minnan Luo, Zesheng Yang, Ming Li, Yong-Jin Liu, Qinghua Zheng
Graph Learning Theory Optimization
  • CDAT integrates energy-based modeling with transformer architectures for improved stability and robustness.
  • The framework employs a mixture von Mises–Fisher attention energy and Hopfield refinement energy.
  • CANN-inspired modulation introduces a control interface for steering and stabilizing inference dynamics.
  • CDAT achieves state-of-the-art performance in graph anomaly detection and classification.
Read more
Taming Curvature: Architecture Warm-Up for Stable Transformer Training
Sameera Ramasinghe, Ajanthan Thalaiyasingam, Hadi Mohaghegh Dolatabadi, Chamin Hewa Koneputugodage, Gil Avraham, Violetta Shevchenko, Yan Zuo, Karol Pajak, Alexander Long
Optimization Large Language Models Theory
  • Introduces a fast online estimator for preconditioned Hessian eigenvalues to track curvature during training.
  • Demonstrates a correlation between training instabilities and surges in preconditioned curvature.
  • Proposes an architecture warm-up strategy to control curvature by gradually increasing network depth.
  • Shows that the proposed method reduces instabilities compared to state-of-the-art techniques.
Read more
Privacy from Symmetry: Orthogonally Equivariant Transformers for LLM Inference
Alexander Yukhimchuk, Andrey Shulga, Mladen Kolar, Martin Takáč
NLP Large Language Models
  • Introduces CONJFORMER, a transformer variant that enhances privacy during LLM inference.
  • Employs orthogonal obfuscation to prevent embedding inversion attacks.
  • Demonstrates significant reduction in token recovery rates while maintaining model performance.
  • Highlights the effectiveness of architectural symmetry in privacy preservation.
Read more
Flood and Harvest: The Provable Necessity of Trivia for Generating Valuable Mathematics via the Lens of Language Generation in the Limit
Xiaoyu Li, Andi Han, Dai Shi, Zheng Gao, Jiaojiao Jiang, Junbin Gao
Theory
  • Verification of mathematical statements is not solely based on taste but on structural conditions.
  • Sound coverage by verifiers allows for the assertion of valid statements while covering unseen valuable mathematics.
  • A phase transition occurs in trivia generation, where the count of trivia, rather than its rate, impacts the coverage of valuable mathematics.
  • An infinite stream of trivial outputs is a provable necessity for generating valuable mathematical insights.
Read more
Decompose Sparsely Where You Should, Absorb Densely Where You Should Not
Ruixuan Deng, Zehao Jin, Zekun Wang, Zihan Dong
NLP Large Language Models Interpretability
  • Introduces a parallel low-rank linear bottleneck to improve sparse autoencoder performance.
  • Demonstrates that a significant portion of activation content is dense and computationally necessary.
  • Finds that removing the dense component drastically increases cross-entropy, indicating its importance.
  • Suggests that current practices in training SAEs may overlook critical dense components, leading to inefficiencies.
Read more
Where Black-box Drug-Target Interaction Prediction Models Look: Cross-Method Explainability
Ali Vefghi, Zahed Rahmati, Mohammad Akbari
Interpretability Graph Learning
  • The study applies multiple post-hoc explainable AI (XAI) techniques to a black-box DTI prediction model.
  • It highlights the significance of bridge nodes and edges in linking drug and protein features.
  • The research demonstrates that interpretability can reveal critical insights into model decision-making processes.
  • Results indicate that explainability can guide the design of novel therapeutics by uncovering underlying mechanisms.
Read more
A Bifurcation Theory Framework for Gradient Descent on the Edge of Stability
Eric Gan
Optimization Theory
  • Development of a bifurcation theory framework for gradient descent applicable to overparameterized networks.
  • Decomposition of EoS dynamics into normal and tangent components, leading to convergence proofs.
  • Identification of the first Lyapunov coefficient as a key stability criterion for EoS training.
  • Unification of prior results, including the product-stability condition, under the new framework.
Read more
Remember, Don't Re-read: Stateful ReAct Agents for Token-Efficient Autonomous Experimentation
Faramarz Jabbarvaziri
Large Language Models Optimization Efficient ML
  • Introduction of a stateful ReAct agent for autonomous experimentation.
  • Significant reduction in token consumption compared to stateless designs.
  • Maintains experimental history and reasoning across iterations.
  • Demonstrated effectiveness in hyperparameter tuning and code optimization tasks.
Read more
Rethinking Structural Anomaly Detection: From Decision Boundaries to Projection Operators
Alexander Bauer
Computer Vision
  • Introduces a geometric perspective on anomaly detection using projection operators.
  • Addresses the limitations of boundary-based methods in high-dimensional data settings.
  • Defines anomalies based on projection residuals, improving detection accuracy.
  • Unifies understanding of reconstruction-based methods through projection quality.
Read more
Graph Diffusion Residuals for Control-Function Instrumental Variables
Rui Wu, Zongyuan Chen, Hong Xie, Defu Lian, Enhong Chen
Graph Learning Theory
  • A-IHF is a deterministic graph-diffusion method for extracting residuals in control-function IV estimators.
  • The method effectively identifies treatment jumps and preserves relevant residual information for causal inference.
  • Theoretical analysis provides insights into error decomposition and performance guarantees.
  • A-IHF outperforms traditional methods in synthetic benchmarks and real-world applications.
Read more
The Weight Norm Sets the Grokking Timescale: A Causal Delay Law
Truong Xuan Khanh, Doan Hoang Viet, Luu Duc Trung, Phan Thanh Duc
Theory Interpretability
  • The weight norm causally controls the timescale of grokking, reconciling conflicting theories.
  • A matched-counterfactual clamp shows that grokking can occur at any weight norm, with time to grok following an exponential delay law.
  • The delay law exhibits a shared exponent across different tasks, indicating a scaling relationship.
  • Normalization techniques, such as LayerNorm, alter the influence of weight norm on grokking.
Read more
High-Dimensional Random Projection for Activation Steering in Language Models
Minh-Hieu Pham, Bach Do, Laziz Abdullaev, Tan Minh Nguyen, Khoat Than
NLP Large Language Models Theory
  • HiDRA captures richer discriminative signals in high-dimensional space, overcoming limitations of traditional DiM methods.
  • Theoretical analysis supports the existence of second-order signals under the superposition hypothesis.
  • HiDRA is a training-free, plug-in solution that enhances existing activation steering frameworks.
  • Experimental evaluations show consistent performance improvements across diverse LLMs and tasks.
Read more
Compressed Computation is (probably) not Computation in Superposition
Jai Bhagat, Sara Molas-Medina, Giorgi Giglemiani, Stefan Heimersheim
Theory Interpretability
  • The CC model's performance advantage is primarily due to a noisy mixing matrix rather than true computation in superposition.
  • Performance scales with the magnitude of the mixing matrix, indicating its critical role in the model's success.
  • A new SNMF baseline derived from the mixing matrix can replicate the qualitative loss profile of the CC model.
  • The learned neuron directions concentrate in the subspace associated with the top eigenvalues of the mixing matrix.
Read more
Unsupervised Learning for Missing Modalities in Multimodal Learning
Hassan Ismkhan, Hamid Bouchahcia
Multimodal
  • UL4M4 supports arbitrary numbers of modalities and missing patterns at the sample level.
  • The framework decouples the imputation process from downstream tasks, enhancing generalizability.
  • It achieves state-of-the-art performance in multimodal sentiment analysis under severe missing conditions.
  • The methodology includes a novel partial-modality distance metric and modality-specific normalization.
Read more
Federated Learning for Feature Generalization with Convex Constraints
Dongwon Kim, Donghee Kim, Sung Kuk Shyn, Kwangsu Kim
Federated Learning
  • FedCONST introduces convex constraints to enhance feature generalization in Federated Learning.
  • The method adaptively modulates updates based on the global model's parameter strength.
  • FedCONST effectively mitigates overfitting and improves generalization across diverse FL environments.
  • The approach is validated through theoretical analysis and extensive empirical testing.
Read more
Neither Parallel Nor Sequential: How DiffusionGemma Actually Commits Tokens
Ali Asaria, Tony Salomone, Deep Gandhi
NLP Large Language Models Generative Models
  • DiffusionGemma exhibits a partial left-to-right commit bias that is granularity-dependent.
  • Token commitment order is moderate and shows significant sub-block disorder.
  • The model's performance is comparable to its autoregressive counterpart in certain tasks.
  • Commit confidence can predict correctness in specific regimes but not universally.
Read more
Policy Regret for Embedding Model Routing: Contextual Bandits with Low-Rank Experts
Yan Dai, Negin Golrezaei, Patrick Jaillet
Theory Optimization Efficient ML
  • Formalizes embedding model routing as an adversarial contextual linear bandit problem.
  • Introduces a log-quadratic policy class for efficient online learning in model routing.
  • Develops the Hypentropy Policy Gradient (HPG) algorithm that adapts to low-rank structures.
  • Achieves O(s√MT) linearized policy regret, avoiding the curse of dimensionality.
Read more
Uncertainty Estimation and Generalization Bounds for Modern Deep Learning
Luis A. Ortega
Theory
  • Introduction of the Deep Variational Implicit Process (DVIP) for scalable Bayesian modeling.
  • Development of two methods (VaLLA and FMGP) for uncertainty calibration in deterministic networks.
  • Establishment of a unified probabilistic framework linking diversity, smoothness, and stochasticity in generalization.
  • Derivation of PAC-Chernoff bounds that explain double-descent behavior.
Read more
Smoothing Dark Areas in Molecular Latent Diffusion
Xi Wang, Jiahan Li, Yuxuan Xia, Yingcheng Wu, Shaoyi Zheng, Shengjie Wang
Generative Models Graph Learning Optimization
  • Identification of dark areas in molecular latent space that lead to invalid molecule generation.
  • Introduction of TopVAE, which incorporates topology and chemical constraints during training.
  • Demonstrated improvements in off-posterior robustness and molecular generation quality.
  • Achieved significant reductions in FCD3D metrics on QM9 and GEOM-Drugs datasets.
Read more
HAPI-EP: Towards Hybrid, Adaptive, and Predictive Digital Twins of Cardiac Electrophysiology
Sumeet Vadhavkar, Xiajun Jiang, Yubo Ye, Maryam Toloubidokhti, Linwei Wang
Theory Generative Models Optimization
  • HAPI addresses the challenges of dynamic adaptation and predictive capability in digital twins for cardiac electrophysiology.
  • The framework combines mechanistic models with neural networks to create a hybrid model that is interpretable and efficient.
  • HAPI enables rapid adaptation to live data through few-shot learning and meta-learning techniques.
  • The proposed approach ensures theoretical identifiability, enhancing the predictive performance of the digital twin.
Read more
Hierarchical ODE: Learning Continuous-Time Physical Prototypes for Early Link Failure Detection
Jiaen Lv, Leran Qi, Shaowei Wang
Time Series
  • Introduces a hierarchical ODE clustering network for time series prototype learning.
  • Effectively disentangles smooth trends from stochastic noise in signal data.
  • Autonomously determines the number of prototypes without rigid constraints.
  • Validated on early link failure detection with irregularly sampled time series.
Read more
Causal-Privacy Audit Workflow for Synthetic and Distilled Data in Dropout Support
Hanghang Zheng, Xiwei Zhuang, Zhong Wang, Hong Liu, Xiao Chen, Jingwen He, Xia Li
Theory
  • Introduces CaP-Eval, a workflow for auditing synthetic and distilled data in dropout support.
  • Demonstrates that DPGNet and distilled data better preserve financial-status treatment effects compared to other methods.
  • Highlights the importance of joint audits for predictive utility, privacy, and causal fidelity in educational data.
  • Identifies the need for careful consideration of data generation methods in institutional decision-making processes.
Read more
Small LLMs: Pruning vs. Training from Scratch
Yufeng Xu, Taiming Lu, Kunjun Li, Jiachen Zhu, Mingjie Sun, Zhuang Liu
Large Language Models Efficient ML NLP
  • Pruning provides a strong initialization advantage over random initialization for small LLMs.
  • The advantage of pruning diminishes as the pruning ratio increases and with extended training.
  • When given a full token budget, training from scratch can match or exceed the performance of coarser pruning methods.
  • Fine-grained pruning retains more knowledge transfer from the parent model compared to coarser methods.
Read more
Graph-based Target Back-Propagation for Context Adaptation in Multi-LLM Agentic Systems
Tan Zhu, Tong Yao, Kananart Kuwaranancharoen, Amit Singh, Yushang Lai, Deepa Mohan, Shankara Bhargava
Large Language Models Graph Learning Optimization
  • GTBP introduces a graph-based framework for context adaptation in multi-LLM systems.
  • The method addresses credit assignment issues by propagating local target outputs rather than relying on gradients.
  • Theoretical analysis confirms the stability of prompt updates and the ability to decrease task-level objectives.
  • Empirical validation shows GTBP outperforms existing methods on benchmark datasets.
Read more
Beyond Defensive Reporting: Machine Learning for Active Anti-Money Laundering Control in Insurance
Dara Goldar, Geir Kjetil Ferkingstad Sandve, Martin Jullum
Theory Optimization
  • First empirical study of machine learning for money laundering detection in insurance claims.
  • Shift from passive reporting to active prevention in anti-money laundering strategies.
  • Incorporation of insurance fraud labels improves detection of laundering cases.
  • Introduction of the Budget-Weighted Capture Rate metric for evaluating model performance.
Read more
Utility-Constrained Policy Optimization
Mehrdad Moghimi, Bernardo Avila Pires
Reinforcement Learning Optimization Robotics
  • Introduces a modification to UCMDPs that allows for practical solutions and optimal Markov policies.
  • Presents utility-constrained policies (UCP), a Lagrangian deep RL algorithm for UCMDPs.
  • Demonstrates strong performance of UCP in Safety Gymnasium benchmarks, outperforming existing methods.
  • Allows for post-training adjustment of constraint limits, enhancing policy flexibility.
Read more
Benchmarking Instance-Dependent Label Noise with Controlled Corruptions
Shadman Islam, Agustinus Kristiadi, Mostafa Milani
Computer Vision Theory
  • CILN framework allows for explicit control over the source and severity of instance-dependent label noise.
  • The benchmarks created using CILN demonstrate realistic noise patterns that better reflect human uncertainty.
  • Corruption-mediated IDN can expose failure modes in existing noisy-label learning methods.
  • The study emphasizes the importance of noise structure in evaluating machine learning algorithms.
Read more
How Post-Training Shapes Biological Reasoning Models
Lukas Fesser, Hanlin Zhang, Michelle M. Li, Eric Wang, Bryan Perozzi, Shekoofeh Azizi, Sham M. Kakade, Marinka Zitnik
Multimodal
  • Post-training stages uniquely influence model generalization in biological reasoning.
  • CPT aligns models with biological language, improving downstream performance.
  • SFT enhances in-domain performance but can lead to over-specialization and decline in out-of-domain performance.
  • Reinforcement learning can recover generalization when applied to strong SFT checkpoints.
Read more
Beyond Classification: A Cough Regression Benchmark for Respiratory Acoustic Foundation Models
Mayur Sanap, Prasanna Desikan, Edgar Lobaton
Audio & Speech
  • Introduces a cough regression benchmark evaluating five respiratory acoustic foundation models across multiple health targets.
  • Demonstrates that MLP-small architecture consistently outperforms mean-predictor baselines and linear probing.
  • Finds that generative pretraining provides an advantage in age regression tasks.
  • Highlights the asymmetry in cross-dataset transfer, with larger datasets performing better on smaller clinical populations.
Read more
Analytic Torsion and Spectral Gap Capture Persistent-Laplacian Performance
Jernej Grlj, Aaron D. Lauda
Graph Learning Theory Efficient ML
  • Persistent Laplacians offer a richer geometric representation than persistent homology but face challenges in high-dimensional data.
  • The proposed compact spectral representation includes Betti numbers, spectral gap, and analytic torsion, which effectively captures essential predictive signals.
  • The new representation outperforms traditional methods in some cases while reducing computational complexity and noise.
  • Analytic torsion serves as a mathematically grounded feature that simplifies the transition from raw spectral data to predictive performance.
Read more
MyPCBench: A Benchmark for Personally Intelligent Computer-Use Agents
Lawrence Keunho Jang, Andrew Keunwoo Jang, Jing Yu Koh, Ruslan Salakhutdinov
NLP Large Language Models
  • MYPCBENCH addresses the gap in existing benchmarks by simulating a personal computing environment with a coherent user identity.
  • The benchmark includes 184 tasks that reflect real-world requests, enhancing the relevance of the evaluation.
  • The best-performing model, Claude Opus 4.6, solves only 55.4% of tasks, indicating significant room for improvement in multi-application interactions.
  • The study emphasizes the importance of personalization in evaluating agent performance, particularly in complex, multi-turn tasks.
Read more
Not all Jensen-Shannon Divergence Estimators are Equal
Alba Garrido, Alejandro Almodóvar, Mar Elizo, Patricia A. Apellániz, Santiago Zazo, Juan Parras
Theory
  • Different estimation protocols yield substantially different Jensen-Shannon divergence values.
  • Marginal estimators underestimate divergence by ignoring joint dependencies.
  • Classifier-based estimators are sensitive to model choice and data characteristics.
  • A closed-form posterior correction is proposed to address prior shift in classifier-based estimations.
Read more
Learning Urban Access Costs from Origin-Destination Flows via Inverse Optimal Transport
Paula Joy B. Martinez
Optimization Interpretability
  • Introduces two inverse optimal transport models for estimating urban access costs.
  • Demonstrates the application of the framework to large-scale school choice data in the Philippines.
  • Estimates a subsidy-equivalent distance metric that quantifies perceived travel cost offsets.
  • Highlights the spatial variability of subsidy impacts on accessibility.
Read more
Provably Safe, Yet Scalable Reinforcement Learning
Kai S. Yun, Zeyang Li, Navid Azizan
Reinforcement Learning Robotics Theory
  • Introduction of the PS2-RL framework for scalable and provably safe RL.
  • Utilization of a learned backup policy to create implicit control-invariant sets.
  • Development of a novel safe-arrival value function for optimal policy training.
  • Implementation of a control-invariant layer for efficient end-to-end training.
Read more
Can Neural Networks Achieve Optimal Computational-statistical Tradeoff? An Analysis on Single-Index Model
Siyu Chen, Beining Wu, Miao Lu, Zhuoran Yang, Tianhao Wang
Theory Efficient ML
  • Neural networks can achieve optimal sample complexity for learning Gaussian single-index models.
  • The proposed algorithm adapts to various loss and activation functions, enhancing its applicability.
  • A novel weight perturbation technique is introduced to handle k-sparse signals effectively.
  • The results demonstrate that neural networks can match SQ lower bounds, addressing open problems in the field.
Read more
D2H-AD: A Hybrid Model Utilizing Hyperdimensional Computing for Advanced Anomaly Detection
Ghazal Ghajari, Elaheh Ghajari, Ashutosh Ghimire, Saeid Ataei, Faris Alsulami, Fathi Amsaad
Efficient ML Interpretability
  • D2H-AD utilizes Hyperdimensional Computing for improved anomaly detection.
  • The hybrid model combines distance-based similarity with density-aware encoding.
  • Ablation studies show significant performance improvements over traditional methods.
  • D2H-AD is lightweight and interpretable, suitable for edge AI applications.
Read more
Curvature-Guided Geometric Representation for Protein-Ligand Binding Affinity Prediction
Shuai Li, Chuan-Xian Ren, Yuhao Li, Ziqi Huang, Yue Pan, Mingzhe Tang, Hong Yan
Graph Learning
  • Introduction of RicciBind, a novel framework for PLA prediction.
  • Utilization of Ricci curvature for enhanced molecular structure representation.
  • Optimal transport mechanism for aligning protein-ligand clusters.
  • Significant improvements in predictive performance and interpretability.
Read more
Attention-Based Estimation of the Individual Treatment Benefit Probability under Dose Variation
Lev V. Utkin, Andrei V. Konstantinov, Stanislav K. Kogan, Natalya M. Verbova, Maksim I. Goriunov
Theory
  • Introduces a framework for estimating IPTB under dose variation, moving beyond binary treatment settings.
  • Utilizes attention mechanisms for effective aggregation of treatment effects from covariate-similar patient comparisons.
  • Demonstrates superior performance of the proposed method over traditional kernel regression approaches.
  • Provides a foundation for personalized dose selection based on individual-level treatment benefit probabilities.
Read more
Multi-Fidelity SINDy: Sparse Discovery of Nonlinear Dynamical Systems with Fidelity-Weighted Measurements
Filippo Zacchei, Ana Larrañaga, Attilio Frangi, Andrea Manzoni, Steven L. Brunton
Time Series Theory Interpretability
  • Introduction of a multi-fidelity SINDy framework that combines low and high-fidelity data.
  • Statistical justification for a covariance-aware weighting strategy in regression.
  • Validation on benchmark systems shows improved model recovery despite noise.
  • Demonstrates that low-fidelity data can enhance model performance when high-fidelity data is scarce.
Read more
When Language Representations Interact: Separability and Cross-Lingual Effects in LLMs
Boris Marinov, Angira Sharma, Christian Schroeder de Witt, Philip Torr, Anisoara Calinescu, Jialin Yu
NLP Large Language Models Interpretability
  • Bilingual contrasts can be represented by stable, approximately linear directions in representation space.
  • A covariance-adjusted inner product reduces overlap between language directions, allowing for meaningful interpretation of residual structures.
  • Languages within the same family exhibit a simplex-like geometric organization, indicating hierarchical relationships.
  • Additive interventions reveal systematic effects at the logit level but limited control at the generation level, highlighting challenges in multilingual steering.
Read more
Context-aware Modality-Topology Co-Alignment for Multimodal Attributed Graphs
Sirui Zhang, Xu Wang, Zhengyu Wu, Xunkai Li, Hongchao Qin
Graph Learning Multimodal
  • CoMAG reframes MAG representation learning as task-adaptive context construction and modality-preserving alignment.
  • The framework integrates reliable context learning and modality-specific hop trajectories for improved performance.
  • CoMAG achieves state-of-the-art results across multiple graph-level and modality-level tasks.
  • The approach retains modality-specific evidence while enabling effective cross-modal alignment.
Read more
A Unified Causal-Origin Taxonomy of Distributional Shifts in Reinforcement Learning
Ardianto Wibowo, Paulo E Santos, Amer Baghdadi, Matthew Stephenson, Karl Sammut, Jean-Philippe Diguet
Reinforcement Learning Theory
  • Introduces a unified causal-origin taxonomy for distributional shifts in reinforcement learning.
  • Links ID/OOD generalization and non-stationarity under a common framework.
  • Decomposes the generative interaction process in RL to analyze the impact of various components on distributional shifts.
  • Distinguishes between internal and external shifts, and categorizes shifts into explicit, implicit, and hybrid types.
Read more
pFedUL: Layer-Aware Federated Unlearning for Personalized Federated Learning
Zhuodong Liu, Xiangyu Li, Zhihao Zhang
Federated Learning
  • pFedUL addresses the unique challenges of federated unlearning in personalized federated learning settings.
  • The framework includes a layer-aware approach that distinguishes between shared and personalized model components.
  • New metrics (PPS and CFI) are introduced to evaluate unlearning quality in pFL.
  • Experimental results show pFedUL achieves high personalized accuracy while effectively removing client data contributions.
Read more
Neural Slack Variables for Shape Constraints
Ruben Wiedemann, Antoine Jacquier, Lukas Gonon
Optimization Theory
  • Introduction of neural slack variables as a novel method for enforcing shape constraints in neural networks.
  • Demonstration of the 'constraint drifting' failure mode in traditional penalty and primal-dual methods.
  • Achieved zero violations in monotonicity and convexity test cases, outperforming existing methods.
  • Enabled arbitrage-free learning of volatility surfaces, addressing a critical challenge in quantitative finance.
Read more
Continuous Cross-Domain Traffic State Prediction via Memory-Augmented Graph Liquid Time-Constant Networks
Jinrong Xiang, Ming Xu
Graph Learning Time Series
  • Introduction of MA-GLTC framework for continuous cross-domain traffic prediction.
  • Utilization of spatio-temporal units (STUs) for fine-grained knowledge alignment.
  • Development of GLTC for modeling traffic dynamics with adaptive time constants.
  • Implementation of Memory-based Transfer Storage (MTS) for continual adaptation.
Read more
How to Score Experts for One-Shot MoE Expert Pruning: A Unified Formulation and Selection Principle
Zongfang Liu, Jinghui Zhang, Zijian Ma, Guangyi Chen, Xin Yuan
NLP Large Language Models Efficient ML
  • Introduces a unified formulation for one-shot MoE expert pruning based on routing frequency, gate weighting, and activation strength.
  • Establishes a selection principle for pruning criteria that varies between task-agnostic and task-specific scenarios.
  • Presents two new task-agnostic criteria, MAN and MSAN, which show superior performance compared to existing methods.
  • Demonstrates the effectiveness of the proposed criteria across four MoE models and 16 diverse benchmarks.
Read more
LapidaryEngine: Fully Conversational Crystal Generation
Yusei Ito, Yuta Suzuki, Tomoya Murata, Masaki Adachi
Generative Models NLP Large Language Models
  • LapidaryEngine enables fully conversational interaction for crystal generation.
  • It introduces a pivot representation for bidirectional translation between text and crystal structures.
  • The model allows iterative refinement based on user feedback, enhancing usability for non-experts.
  • Democratizes materials design by allowing vague natural language prompts.
Read more