Wei Xiong's picture

Wei Xiong

weqweasdas

·

https://weixiongust.github.io/WeiXiongUST/index.html

AI & ML interests

Machine learning, RLHF

Recent Activity

upvoted a paper about 1 month ago

Visual Backdoor Attacks on MLLM Embodied Decision Making via Contrastive Trigger Learning

updated a dataset about 1 month ago

weqweasdas/qwen15b_train_simple_subset5k_for_difficulty_transition

published a dataset about 1 month ago

weqweasdas/qwen15b_train_simple_subset5k_for_difficulty_transition

View all activity

Organizations

commented a paper 2 months ago

Reinforce-Ada: An Adaptive Sampling Framework for Reinforce-Style LLM Training

Paper • 2510.04996 • Published Oct 6 • 15 •

commented 2 papers 8 months ago

A Minimalist Approach to LLM Reasoning: from Rejection Sampling to Reinforce

Paper • 2504.11343 • Published Apr 15 • 19 •

A Minimalist Approach to LLM Reasoning: from Rejection Sampling to Reinforce

Paper • 2504.11343 • Published Apr 15 • 19 •

commented 4 papers 9 months ago

Self-rewarding correction for mathematical reasoning

Paper • 2502.19613 • Published Feb 26 • 83 •

Self-rewarding correction for mathematical reasoning

Paper • 2502.19613 • Published Feb 26 • 83 •

Self-rewarding correction for mathematical reasoning

Paper • 2502.19613 • Published Feb 26 • 83 •

Self-rewarding correction for mathematical reasoning

Paper • 2502.19613 • Published Feb 26 • 83 •

New activity in RLHFlow/LLaMA3-SFT about 1 year ago

LLaMA3.1-SFT

#3 opened about 1 year ago by

New activity in RLHFlow/LLaMA3-SFT over 1 year ago

How to use llama 3sft model, pipeline or tokenizer.apply_chat_template. Can you provide a simple example? Thank you very much for your contribution

#2 opened over 1 year ago by

Missing BOS token in tokenized text

#1 opened over 1 year ago by

New activity in RLHFlow/ArmoRM-Llama3-8B-v0.1 over 1 year ago

Special tokens in the vocabulary?

#13 opened over 1 year ago by

New activity in sfairXC/FsfairX-LLaMA3-RM-v0.1 over 1 year ago

TypeError: Got unsupported ScalarType BFloat16

#5 opened over 1 year ago by

New activity in RLHFlow/pair-preference-model-LLaMA3-8B over 1 year ago

Could you please test the consistency of preference between `RLHFlow/pair-preference-model-LLaMA3-8B` and GPT-4 on alpacaeval dataset?

#2 opened over 1 year ago by

commented a paper over 1 year ago

RLHF Workflow: From Reward Modeling to Online RLHF

Paper • 2405.07863 • Published May 13, 2024 • 71 •

New activity in weqweasdas/RM-Mistral-7B over 1 year ago

why vocab size is 32001

#3 opened over 1 year ago by

License

#2 opened over 1 year ago by

Fix dataset link

#1 opened over 1 year ago by