euclaise's picture

euclaise

euclaise

·

https://euclaise.xyz

euclaise

AI & ML interests

None yet

Recent Activity

liked a model 1 day ago

amd/Zebra-Llama-8B-8MLA-24Mamba-SFT

liked a model 1 day ago

EssentialAI/rnj-1-instruct

liked a model 3 days ago

TeichAI/Qwen3-30B-A3B-Thinking-2507-Claude-4.5-Sonnet-High-Reasoning-Distill-GGUF

View all activity

Organizations

upvoted 4 papers 7 days ago

REG: A Regularization Optimizer for Robust Training Dynamics

Paper • 2510.03691 • Published Oct 4 • 1

Unbiased Gradient Low-Rank Projection

Paper • 2510.17802 • Published Oct 20 • 1

What Really Matters in Matrix-Whitening Optimizers?

Paper • 2510.25000 • Published Oct 28 • 1

ROOT: Robust Orthogonalized Optimizer for Neural Network Training

Paper • 2511.20626 • Published 12 days ago • 169

upvoted a paper 8 days ago

SSA: Sparse Sparse Attention by Aligning Full and Sparse Attention Outputs in Feature Space

Paper • 2511.20102 • Published 13 days ago • 26

upvoted a paper 9 days ago

Tiny Model, Big Logic: Diversity-Driven Optimization Elicits Large-Model Reasoning Ability in VibeThinker-1.5B

Paper • 2511.06221 • Published 29 days ago • 128

upvoted a paper 17 days ago

Higher-order Linear Attention

Paper • 2510.27258 • Published Oct 31 • 14

upvoted 3 papers about 1 month ago

Kimi Linear: An Expressive, Efficient Attention Architecture

Paper • 2510.26692 • Published Oct 30 • 115

Knocking-Heads Attention

Paper • 2510.23052 • Published Oct 27 • 29

Parallel Loop Transformer for Efficient Test-Time Computation Scaling

Paper • 2510.24824 • Published Oct 28 • 15

upvoted 5 papers about 2 months ago

Efficient Parallel Samplers for Recurrent-Depth Models and Their Connection to Diffusion Language Models

Paper • 2510.14961 • Published Oct 16 • 7

On Pretraining for Project-Level Code Completion

Paper • 2510.13697 • Published Oct 15 • 6

NorMuon: Making Muon more efficient and scalable

Paper • 2510.05491 • Published Oct 7 • 8

GCPO: When Contrast Fails, Go Gold

Paper • 2510.07790 • Published Oct 9 • 5

MemMamba: Rethinking Memory Patterns in State Space Model

Paper • 2510.03279 • Published Sep 28 • 72

upvoted 4 papers 2 months ago

StockBench: Can LLM Agents Trade Stocks Profitably In Real-world Markets?

Paper • 2510.02209 • Published Oct 2 • 52

Sparse Query Attention (SQA): A Computationally Efficient Attention Mechanism with Query Heads Reduction

Paper • 2510.01817 • Published Oct 2 • 15

SIM-CoT: Supervised Implicit Chain-of-Thought

Paper • 2509.20317 • Published Sep 24 • 41

Soft Tokens, Hard Truths

Paper • 2509.19170 • Published Sep 23 • 15

upvoted a paper 3 months ago

Benchmarking Optimizers for Large Language Model Pretraining

Paper • 2509.01440 • Published Sep 1 • 24