Shwai He's picture

Shwai He

Shwai

·

https://shwai-he.github.io/

Shwai-He

AI & ML interests

Deep Learning, Mechine Learning, Natural Language Processing.

Recent Activity

upvoted a paper 10 days ago

Understanding and Harnessing Sparsity in Unified Multimodal Models

commented on a paper 10 days ago

Understanding and Harnessing Sparsity in Unified Multimodal Models

upvoted a paper 24 days ago

MMaDA-Parallel: Multimodal Large Diffusion Language Models for Thinking-Aware Editing and Generation

View all activity

Organizations

upvoted a paper 10 days ago

Understanding and Harnessing Sparsity in Unified Multimodal Models

Paper • 2512.02351 • Published 12 days ago • 1

upvoted a paper 24 days ago

MMaDA-Parallel: Multimodal Large Diffusion Language Models for Thinking-Aware Editing and Generation

Paper • 2511.09611 • Published Nov 12 • 68

upvoted a paper about 1 month ago

When Visualizing is the First Step to Reasoning: MIRA, a Benchmark for Visual Chain-of-Thought

Paper • 2511.02779 • Published Nov 4 • 57

upvoted a collection 2 months ago

Qwen3-VL

37 items • Updated Nov 1 • 512

upvoted a paper 3 months ago

Dense Video Understanding with Gated Residual Tokenization

Paper • 2509.14199 • Published Sep 17 • 2

upvoted a paper 6 months ago

SRPO: Enhancing Multimodal LLM Reasoning via Reflection-Aware Reinforcement Learning

Paper • 2506.01713 • Published Jun 2 • 48

upvoted a collection 9 months ago

computation

this is for Mixture of XXX • 1 item • Updated Oct 23, 2024 • 2

upvoted 2 papers 9 months ago

Router-Tuning: A Simple and Effective Approach for Enabling Dynamic-Depth in Transformers

Paper • 2410.13184 • Published Oct 17, 2024 • 3

Capacity-Aware Inference: Mitigating the Straggler Effect in Mixture of Experts

Paper • 2503.05066 • Published Mar 7 • 4

upvoted 2 papers 11 months ago

DeepSeek-R1: Incentivizing Reasoning Capability in LLMs via Reinforcement Learning

Paper • 2501.12948 • Published Jan 22 • 429

DeepSeek-V2: A Strong, Economical, and Efficient Mixture-of-Experts Language Model

Paper • 2405.04434 • Published May 7, 2024 • 24

upvoted a collection about 1 year ago

LLM-Drop

Model weights of paper "What Matters in Transformers? Not All Attention is Needed" (https://arxiv.org/abs/2406.15786) • 14 items • Updated Oct 23, 2024 • 4

upvoted a paper about 1 year ago

What Matters in Transformers? Not All Attention is Needed

Paper • 2406.15786 • Published Jun 22, 2024 • 31

upvoted a collection over 1 year ago

Papers - MoE

46 items • Updated Dec 22, 2024 • 3

upvoted an article over 1 year ago

Article

Merge Large Language Models with mergekit

Jan 9, 2024

•

146

upvoted a paper over 2 years ago

AlpaGasus: Training A Better Alpaca with Fewer Data

Paper • 2307.08701 • Published Jul 17, 2023 • 23