Yash Marathe's picture

Open to Work

Yash Marathe

yashmarathe

·

AI & ML interests

None yet

Recent Activity

liked a model 1 day ago

EssentialAI/rnj-1-instruct

liked a model 3 days ago

NousResearch/Hermes-4.3-36B

liked a dataset 3 days ago

LLM360/guru-RL-92k

View all activity

Organizations

upvoted an article 6 days ago

Article

Accelerate ND-Parallel: A guide to Efficient Multi-GPU Training

+3

Aug 8

•

86

upvoted a paper 20 days ago

Virtual Width Networks

Paper • 2511.11238 • Published 24 days ago • 35

upvoted a paper about 1 month ago

Supervised Reinforcement Learning: From Expert Trajectories to Step-wise Reasoning

Paper • 2510.25992 • Published Oct 29 • 44

upvoted 2 articles 2 months ago

Article

Gaia2 and ARE: Empowering the community to study agents

+7

Sep 22

•

120

Article

Ultra-Long Sequence Parallelism: Ulysses + Ring-Attention Technical Principles and Implementation

Sep 16

•

15

upvoted a paper 3 months ago

Towards General Agentic Intelligence via Environment Scaling

Paper • 2509.13311 • Published Sep 16 • 71

upvoted 3 collections 5 months ago

SuperBPE

SuperBPE tokenizers and models trained with them • 9 items • Updated 20 days ago • 17

💧 LFM2

LFM2 is a new generation of hybrid models, designed for on-device deployment. • 23 items • Updated 6 days ago • 125

Hybrid Linear Attention Research

All 1.3B & 340M hybrid linear-attention experiments. • 62 items • Updated Sep 11 • 12

upvoted 2 collections 6 months ago

Avey 1 Research Preview

1.5B preview models trained on 100B tokens of FineWeb, and an instruct-tuned version on smoltalk. • 3 items • Updated Jun 16 • 6

V-JEPA 2

A frontier video understanding model developed by FAIR, Meta, which extends the pretraining objectives of https://ai.meta.com/blog/v-jepa-yann • 8 items • Updated Jun 13 • 173

upvoted 4 collections 7 months ago

Falcon-H1

Falcon-H1 Family of Hybrid-Head Language Models (Transformer-SSM), including 0.5B, 1.5B, 1.5B-Deep, 3B, 7B, and 34B (pretrained & instruction-tuned). • 38 items • Updated Nov 6 • 56

LipSync and Face Operations

22 items • Updated Aug 25 • 60

Perception LM

7 items • Updated Apr 17 • 63

Perception Encoder

17 items • Updated Jul 11 • 71

upvoted 3 collections 8 months ago

Skywork-OR1

Skywork Open Reasoner 1 • 11 items • Updated May 29 • 31

Kimina Prover Preview

State-of-the-Art Models for Formal Mathematical Reasoning • 5 items • Updated Apr 28 • 33

Kimi-VL-A3B

Moonshot's efficient MoE VLMs, exceptional on agent, long-context, and thinking • 7 items • Updated Oct 30 • 77

upvoted an article 9 months ago

Article

makeMoE: Implement a Sparse Mixture of Experts Language Model from Scratch

May 7, 2024

•

109

upvoted a collection 9 months ago

Cosmos

The collection of Cosmos models • 31 items • Updated 4 days ago • 298