Carnegie Mellon University

university

Verified

https://www.cmu.edu

Activity Feed Request to join this org

AI & ML interests

None defined yet.

Recent Activity

seungone authored a paper 6 days ago

RefineBench: Evaluating Refinement Capability of Language Models via Checklists

bshook24 authored a paper 12 days ago

STAMP: Spatial-Temporal Adapter with Multi-Head Pooling

shikhar7ssu authored a paper 20 days ago

OpenBEATs: A Fully Open-Source General-Purpose Audio Encoder

View all activity

Papers

RefineBench: Evaluating Refinement Capability of Language Models via Checklists

View all Papers

VanishD

authored a paper 11 days ago

Generalizable End-to-End Tool-Use RL with Synthetic CodeGym

Paper • 2509.17325 • Published Sep 22 • 1

bshook24

authored a paper 12 days ago

STAMP: Spatial-Temporal Adapter with Multi-Head Pooling

Paper • 2511.10848 • Published 24 days ago • 1

shikhar7ssu

authored a paper 20 days ago

OpenBEATs: A Fully Open-Source General-Purpose Audio Encoder

Paper • 2507.14129 • Published Jul 18 • 9

shikhar7ssu

authored a paper about 1 month ago

POWSM: A Phonetic Open Whisper-Style Speech Foundation Model

Paper • 2510.24992 • Published Oct 28 • 2

ethanning

authored a paper 2 months ago

Less LLM, More Documents: Searching for Improved RAG

Paper • 2510.02657 • Published Oct 3 • 2

shikhar7ssu

authored a paper 5 months ago

Gemini 2.5: Pushing the Frontier with Advanced Reasoning, Multimodality, Long Context, and Next Generation Agentic Capabilities

Paper • 2507.06261 • Published Jul 7 • 64

THZed

authored a paper 6 months ago

ReinFlow: Fine-tuning Flow Matching Policy with Online Reinforcement Learning

Paper • 2505.22094 • Published May 28 • 3

ethanning

authored a paper 6 months ago

DeepResearchGym: A Free, Transparent, and Reproducible Evaluation Sandbox for Deep Research

Paper • 2505.19253 • Published May 25 • 32

yueqis

authored 4 papers 8 months ago

Beyond Browsing: API-Based Web Agents

Paper • 2410.16464 • Published Oct 21, 2024 • 2

Crowdsource, Crawl, or Generate? Creating SEA-VL, a Multicultural Vision-Language Dataset for Southeast Asia

Paper • 2503.07920 • Published Mar 10 • 101

SkillWeaver: Web Agents can Self-Improve by Discovering and Honing Skills

Paper • 2504.07079 • Published Apr 9 • 12

VisualPuzzles: Decoupling Multimodal Reasoning Evaluation from Domain Knowledge

Paper • 2504.10342 • Published Apr 14 • 10

JixuanLeng

authored a paper 8 months ago

CrossWordBench: Evaluating the Reasoning Capabilities of LLMs and LVLMs with Controllable Puzzle Generation

Paper • 2504.00043 • Published Mar 30 • 9

JixuanLeng

authored a paper 11 months ago

S$^{2}$FT: Efficient, Scalable and Generalizable LLM Fine-tuning by Structured Sparsity

Paper • 2412.06289 • Published Dec 9, 2024

viswavi

authored a paper about 1 year ago

Evaluating Language Models as Synthetic Data Generators

Paper • 2412.03679 • Published Dec 4, 2024 • 48

shikhar7ssu

authored 3 papers about 1 year ago

Learning to Answer Semantic Queries over Code

Paper • 2209.08372 • Published Sep 17, 2022

IndicGenBench: A Multilingual Benchmark to Evaluate Generation Capabilities of LLMs on Indic Languages

Paper • 2404.16816 • Published Apr 25, 2024 • 3

STAB: Speech Tokenizer Assessment Benchmark

Paper • 2409.02384 • Published Sep 4, 2024 • 1

yueqis

authored a paper about 1 year ago

Pangea: A Fully Open Multilingual Multimodal LLM for 39 Languages

Paper • 2410.16153 • Published Oct 21, 2024 • 44

JixuanLeng

authored a paper about 1 year ago

Taming Overconfidence in LLMs: Reward Calibration in RLHF

Paper • 2410.09724 • Published Oct 13, 2024 • 3