namespace-Pt
/

beacon-qwen-2-7b-instruct

Text Generation

text-generation-inference

Model card Files Files and versions

namespace-Pt commited on Oct 15, 2024

Commit

30b1390

·

verified ·

1 Parent(s): bf9b66d

Update README.md

Files changed (1) hide show

README.md +2 -26

README.md CHANGED Viewed

@@ -6,15 +6,7 @@ pipeline_tag: text-generation
 # Intro
-[Activation Beacon](https://arxiv.org/abs/2401.03462) compresses the original KV into fewer yet more compact states (a.k.a. beacons) and hence enables the LLM to perceive longer context given its fixed context window. It is known for the following features:
-- **Effective**
-  - there is little information loss given a compression ratio of 2, 4, and 8;
-- **Efficient**
-  - it drastically reduces the GPU consumption of KV cache;
-- **Compatible**
-  - it can work together with position extrapolation (e.g. YaRN) to further extends the context length; it can also work with grouped query attention to further reduce the KV cache size;
-- **Low-Cost**
-  - it is light-weight and can be efficiently trained with roughly 1B tokens.
 # Environment
 ```
@@ -63,20 +55,4 @@ with torch.no_grad():
   print(f"Answers:      {example['answer']}")
   print(f"Prediction:   {tokenizer.decode(outputs[0], skip_special_tokens=True)}")
 ```
-**NOTE**: It's okay to see warnings like `This is a friendly reminder - the current text generation call will exceed the model's predefined maximum length (32768). Depending on the model, you may observe exceptions, performance degradation, or nothing at all.` Just ignore it.
-# Results
-## LongBench
-| Model                         | Single QA | Multi QA | Summarization | Few-Shot | Code  | AVG    |
-|-------------------------------|-----------|----------|---------------|----------|-------|--------|
-| qwen-2-7b-instruct            | 39.60     | 36.92    | 27.97         | 71.12    | 62.34 | 47.59  |
-| beacon-qwen-2-7b-instruct     | 40.76     | 43.73    | 27.23         | 68.87    | 68.47 | 49.81  |
-## NIAH
-![](needle.png)

 # Intro
+[Activation Beacon](https://arxiv.org/abs/2401.03462) is a plug-in module to transformer-based LLMs that enables effective, efficient, and flexible compression of long contexts.
 # Environment
 ```
   print(f"Answers:      {example['answer']}")
   print(f"Prediction:   {tokenizer.decode(outputs[0], skip_special_tokens=True)}")
 ```
+**NOTE**: It's okay to see warnings like `This is a friendly reminder - the current text generation call will exceed the model's predefined maximum length (32768). Depending on the model, you may observe exceptions, performance degradation, or nothing at all.` Just ignore it.