Spaces:
Paused
Paused
switch to gradio
Browse files- README.md +42 -18
- app.py +111 -120
- requirements.txt +1 -3
README.md
CHANGED
|
@@ -3,36 +3,36 @@ title: Dinercall Intent Demo
|
|
| 3 |
emoji: 🏆
|
| 4 |
colorFrom: red
|
| 5 |
colorTo: gray
|
| 6 |
-
sdk:
|
| 7 |
-
sdk_version:
|
| 8 |
app_file: app.py
|
| 9 |
pinned: false
|
| 10 |
license: apache-2.0
|
| 11 |
short_description: restaurant reservation intent detector
|
| 12 |
---
|
| 13 |
|
| 14 |
-
|
| 15 |
# 🍽️ 餐廳訂位意圖識別系統 (Mandarin Reservation Intent Classifier)
|
| 16 |
|
| 17 |
-
🎙️
|
| 18 |
|
| 19 |
---
|
| 20 |
|
| 21 |
## 🔍 功能介紹
|
| 22 |
|
| 23 |
- 🧠 **語音辨識**:使用 fine-tuned Whisper 模型 [`Jingmiao/whisper-small-zh_tw`](https://huggingface.co/Jingmiao/whisper-small-zh_tw) 將語音轉為繁體中文文字。
|
| 24 |
-
- 🤖 **意圖分類**:使用微調的 ALBERT
|
| 25 |
- 📱 **支援手機與桌機**:介面具備良好響應性,適用於各類瀏覽器與行動裝置。
|
| 26 |
-
- 🔊
|
| 27 |
|
| 28 |
---
|
| 29 |
|
| 30 |
## 🚀 使用方式
|
| 31 |
|
| 32 |
-
1.
|
| 33 |
-
|
| 34 |
-
|
| 35 |
-
|
|
|
|
| 36 |
|
| 37 |
---
|
| 38 |
|
|
@@ -44,29 +44,53 @@ short_description: restaurant reservation intent detector
|
|
| 44 |
### 中文意圖分類模型:
|
| 45 |
- [`Luigi/albert-tiny-chinese-dinercall-intent`](https://huggingface.co/Luigi/albert-tiny-chinese-dinercall-intent)
|
| 46 |
- [`Luigi/albert-base-chinese-dinercall-intent`](https://huggingface.co/Luigi/albert-base-chinese-dinercall-intent)
|
|
|
|
| 47 |
|
| 48 |
---
|
| 49 |
|
| 50 |
## 📦 依賴環境
|
| 51 |
|
| 52 |
```txt
|
| 53 |
-
|
| 54 |
-
|
|
|
|
| 55 |
torch
|
| 56 |
-
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 57 |
```
|
| 58 |
|
| 59 |
---
|
| 60 |
|
| 61 |
## 🛠️ 開發者備註
|
| 62 |
|
| 63 |
-
-
|
| 64 |
-
-
|
| 65 |
- 若需延伸本系統至其他語言或多輪對話,歡迎 fork 本專案進行改造!
|
| 66 |
|
| 67 |
---
|
| 68 |
|
| 69 |
-
© 2024 by [Your Name or Team]. Made with ❤️ using Hugging Face +
|
| 70 |
-
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 71 |
|
| 72 |
-
|
|
|
|
| 3 |
emoji: 🏆
|
| 4 |
colorFrom: red
|
| 5 |
colorTo: gray
|
| 6 |
+
sdk: gradio
|
| 7 |
+
sdk_version: 5+
|
| 8 |
app_file: app.py
|
| 9 |
pinned: false
|
| 10 |
license: apache-2.0
|
| 11 |
short_description: restaurant reservation intent detector
|
| 12 |
---
|
| 13 |
|
|
|
|
| 14 |
# 🍽️ 餐廳訂位意圖識別系統 (Mandarin Reservation Intent Classifier)
|
| 15 |
|
| 16 |
+
🎙️ 本系統讓使用者可以透過**語音錄音**或**文字輸入**,自動判斷是否具有「訂位意圖」,是語音助理或自動客服前端的理想元件之一。這個版本基於 **Gradio** 建構,具有簡單直觀的分頁式輸入模式切換(「麥克風」或「文字」)。
|
| 17 |
|
| 18 |
---
|
| 19 |
|
| 20 |
## 🔍 功能介紹
|
| 21 |
|
| 22 |
- 🧠 **語音辨識**:使用 fine-tuned Whisper 模型 [`Jingmiao/whisper-small-zh_tw`](https://huggingface.co/Jingmiao/whisper-small-zh_tw) 將語音轉為繁體中文文字。
|
| 23 |
+
- 🤖 **意圖分類**:使用微調的 ALBERT 中文模型或 Qwen 模型判斷輸入是否包含訂位意圖。
|
| 24 |
- 📱 **支援手機與桌機**:介面具備良好響應性,適用於各類瀏覽器與行動裝置。
|
| 25 |
+
- 🔊 **雙重輸入模式**:使用者可在「麥克風」和「文字」兩種模式間切換,以提供語音或手動輸入。
|
| 26 |
|
| 27 |
---
|
| 28 |
|
| 29 |
## 🚀 使用方式
|
| 30 |
|
| 31 |
+
1. 選擇輸入模式:
|
| 32 |
+
- 「麥克風」:點擊錄音按鈕開始錄音,錄製完成後自動轉文字並判斷意圖。
|
| 33 |
+
- 「文字」:直接在文字框中輸入語句,再點擊「執行辨識」按鈕。
|
| 34 |
+
2. 從下拉選單選擇使用的模型(例如 ALBERT-tiny、ALBERT-base 或 Qwen)。
|
| 35 |
+
3. 按下「執行辨識」後,系統將顯示轉換後的文字、意圖判斷結果,並以 TTS(語音合成)的方式回應。
|
| 36 |
|
| 37 |
---
|
| 38 |
|
|
|
|
| 44 |
### 中文意圖分類模型:
|
| 45 |
- [`Luigi/albert-tiny-chinese-dinercall-intent`](https://huggingface.co/Luigi/albert-tiny-chinese-dinercall-intent)
|
| 46 |
- [`Luigi/albert-base-chinese-dinercall-intent`](https://huggingface.co/Luigi/albert-base-chinese-dinercall-intent)
|
| 47 |
+
- 或使用 [`Qwen/Qwen2.5-0.5B-Instruct`](https://huggingface.co/Qwen/Qwen2.5-0.5B-Instruct)(透過 Outlines 整合)
|
| 48 |
|
| 49 |
---
|
| 50 |
|
| 51 |
## 📦 依賴環境
|
| 52 |
|
| 53 |
```txt
|
| 54 |
+
llama-cpp-python
|
| 55 |
+
gradio>=5.0.0
|
| 56 |
+
transformers
|
| 57 |
torch
|
| 58 |
+
soundfile
|
| 59 |
+
outlines
|
| 60 |
+
numpy>=1.24,<2.0
|
| 61 |
+
kokoro
|
| 62 |
+
huggingface-hub
|
| 63 |
+
jieba
|
| 64 |
+
docopt
|
| 65 |
+
ordered-set
|
| 66 |
+
cn2an
|
| 67 |
+
pypinyin
|
| 68 |
+
sentencepiece
|
| 69 |
```
|
| 70 |
|
| 71 |
---
|
| 72 |
|
| 73 |
## 🛠️ 開發者備註
|
| 74 |
|
| 75 |
+
- 本應用現改為 Gradio App,適合在 Hugging Face Spaces 上部署,並支援 Gradio V5 的最新功能。
|
| 76 |
+
- 採用雙重輸入模式(麥克風與文字)讓使用者能靈活切換輸入方式。
|
| 77 |
- 若需延伸本系統至其他語言或多輪對話,歡迎 fork 本專案進行改造!
|
| 78 |
|
| 79 |
---
|
| 80 |
|
| 81 |
+
© 2024 by [Your Name or Team]. Made with ❤️ using Hugging Face + Gradio.
|
| 82 |
+
---
|
| 83 |
+
|
| 84 |
+
### Explanation
|
| 85 |
+
|
| 86 |
+
- **README.md:**
|
| 87 |
+
- The SDK and app_file information has been updated to indicate a Gradio-based application.
|
| 88 |
+
- The features have been revised to highlight the dual-input mode (麥克風 vs. 文字).
|
| 89 |
+
- The installation instructions and usage steps now reflect the updated Gradio interface.
|
| 90 |
+
|
| 91 |
+
- **requirements.txt:**
|
| 92 |
+
- The dependencies for Streamlit and streamlit-mic-recorder have been removed.
|
| 93 |
+
- Gradio (version 5.0.0 or higher) has been added as the primary UI framework.
|
| 94 |
+
- The remaining dependencies support the models and other processing components.
|
| 95 |
|
| 96 |
+
Feel free to customize further as needed for your deployment or additional features!
|
app.py
CHANGED
|
@@ -1,56 +1,58 @@
|
|
| 1 |
-
import
|
| 2 |
-
from
|
| 3 |
-
from transformers import AutoTokenizer, AutoModelForSequenceClassification, pipeline
|
| 4 |
-
import outlines # Use outlines with transformers integration
|
| 5 |
-
from torch.nn.functional import softmax
|
| 6 |
import torch
|
|
|
|
|
|
|
|
|
|
|
|
|
| 7 |
import tempfile
|
|
|
|
|
|
|
| 8 |
import re
|
| 9 |
from pathlib import Path
|
| 10 |
-
import
|
| 11 |
-
import
|
| 12 |
-
import numpy as np
|
| 13 |
-
import soundfile as sf
|
| 14 |
-
from kokoro import KPipeline
|
| 15 |
|
| 16 |
-
#
|
|
|
|
| 17 |
|
| 18 |
-
#
|
| 19 |
whisper_model_id = "Jingmiao/whisper-small-zh_tw"
|
| 20 |
-
|
| 21 |
-
# Qwen LLM model identifier (using outlines transformers integration)
|
| 22 |
qwen_model_id = "Qwen/Qwen2.5-0.5B-Instruct"
|
| 23 |
|
| 24 |
-
# Available models for text classification (intent detection) via Transformers
|
| 25 |
available_models = {
|
| 26 |
"ALBERT-tiny (Chinese)": "Luigi/albert-tiny-chinese-dinercall-intent",
|
| 27 |
"ALBERT-base (Chinese)": "Luigi/albert-base-chinese-dinercall-intent",
|
| 28 |
-
"Qwen (via Transformers - outlines)": "qwen"
|
| 29 |
}
|
| 30 |
|
| 31 |
-
#
|
| 32 |
-
|
| 33 |
-
@st.cache_resource
|
| 34 |
def load_whisper_pipeline():
|
| 35 |
-
|
| 36 |
-
|
| 37 |
-
|
| 38 |
-
|
| 39 |
-
|
|
|
|
|
|
|
|
|
|
| 40 |
tokenizer = AutoTokenizer.from_pretrained(model_id, use_fast=True)
|
| 41 |
model = AutoModelForSequenceClassification.from_pretrained(model_id)
|
|
|
|
|
|
|
| 42 |
return tokenizer, model
|
| 43 |
|
| 44 |
-
@
|
| 45 |
def load_qwen_model():
|
| 46 |
-
# Load Qwen using the outlines transformers integration.
|
| 47 |
-
# Note that the prompt-based interaction requires proper chat tokens.
|
| 48 |
return outlines.models.transformers(qwen_model_id)
|
| 49 |
|
| 50 |
-
|
|
|
|
|
|
|
| 51 |
|
| 52 |
-
|
| 53 |
-
|
| 54 |
model = load_qwen_model()
|
| 55 |
prompt = f"""
|
| 56 |
<|im_start|>system
|
|
@@ -76,10 +78,11 @@ Classify the following message: "{text}"
|
|
| 76 |
else:
|
| 77 |
return f"未知回應: {prediction}"
|
| 78 |
|
| 79 |
-
def predict_intent(text, model_id):
|
| 80 |
-
# Use ALBERT-based Transformers for intent detection.
|
| 81 |
tokenizer, model = load_transformers_model(model_id)
|
| 82 |
inputs = tokenizer(text, return_tensors="pt")
|
|
|
|
|
|
|
| 83 |
with torch.no_grad():
|
| 84 |
logits = model(**inputs).logits
|
| 85 |
probs = softmax(logits, dim=-1)
|
|
@@ -89,20 +92,7 @@ def predict_intent(text, model_id):
|
|
| 89 |
else:
|
| 90 |
return f"❌ 無訂位意圖 (Not Reservation intent)(訂位信心度 Confidence: {confidence:.2%})"
|
| 91 |
|
| 92 |
-
def
|
| 93 |
-
text = Path(path).read_text(encoding="utf-8")
|
| 94 |
-
text = re.sub(r"(?s)^---.*?---", "", text).strip()
|
| 95 |
-
text = re.sub(r"^# .*?\n+", "", text)
|
| 96 |
-
return text
|
| 97 |
-
|
| 98 |
-
# ------------------ TTS Integration via kokoro ------------------
|
| 99 |
-
|
| 100 |
-
@st.cache_resource
|
| 101 |
-
def get_tts_pipeline():
|
| 102 |
-
# Instantiate and cache the KPipeline for TTS; setting language code to Chinese.
|
| 103 |
-
return KPipeline(lang_code="z")
|
| 104 |
-
|
| 105 |
-
def get_tts_message(intent_result):
|
| 106 |
if intent_result and "訂位意圖" in intent_result and "無" not in intent_result:
|
| 107 |
return "稍後您將會從簡訊收到訂位連結"
|
| 108 |
elif intent_result:
|
|
@@ -110,7 +100,7 @@ def get_tts_message(intent_result):
|
|
| 110 |
else:
|
| 111 |
return "未能判斷意圖"
|
| 112 |
|
| 113 |
-
def
|
| 114 |
pipeline_tts = get_tts_pipeline()
|
| 115 |
generator = pipeline_tts(message, voice=voice)
|
| 116 |
audio_chunks = []
|
|
@@ -118,78 +108,79 @@ def play_tts_message(message, voice='af_heart'):
|
|
| 118 |
audio_chunks.append(audio)
|
| 119 |
if audio_chunks:
|
| 120 |
audio_concat = np.concatenate(audio_chunks)
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 121 |
else:
|
| 122 |
-
|
| 123 |
-
|
| 124 |
-
|
| 125 |
-
|
| 126 |
-
return
|
| 127 |
-
|
| 128 |
-
|
| 129 |
-
|
| 130 |
-
|
| 131 |
-
|
| 132 |
-
|
| 133 |
-
|
| 134 |
-
|
| 135 |
-
|
| 136 |
-
|
| 137 |
-
|
| 138 |
-
#
|
| 139 |
-
|
| 140 |
-
|
| 141 |
-
|
| 142 |
-
|
| 143 |
-
|
| 144 |
-
|
| 145 |
-
|
| 146 |
-
|
| 147 |
-
|
| 148 |
-
|
| 149 |
-
# Process audio recording input
|
| 150 |
-
if audio:
|
| 151 |
-
st.success("錄音完成!")
|
| 152 |
-
st.audio(audio["bytes"], format="audio/wav")
|
| 153 |
-
with tempfile.NamedTemporaryFile(delete=False, suffix=".wav") as tmpfile:
|
| 154 |
-
tmpfile.write(audio["bytes"])
|
| 155 |
-
tmpfile_path = tmpfile.name
|
| 156 |
-
|
| 157 |
-
with st.spinner("🧠 Whisper 處理語音中..."):
|
| 158 |
-
try:
|
| 159 |
-
whisper_pipe = load_whisper_pipeline()
|
| 160 |
-
result = whisper_pipe(tmpfile_path)
|
| 161 |
-
transcription = result["text"]
|
| 162 |
-
st.success(f"📝 語音轉文字:{transcription}")
|
| 163 |
-
except Exception as e:
|
| 164 |
-
st.error(f"❌ Whisper 錯誤:{str(e)}")
|
| 165 |
-
transcription = ""
|
| 166 |
-
|
| 167 |
-
if transcription:
|
| 168 |
-
with st.spinner("預測中..."):
|
| 169 |
-
if model_id == "qwen":
|
| 170 |
-
result_text = predict_with_qwen(transcription)
|
| 171 |
-
else:
|
| 172 |
-
result_text = predict_intent(transcription, model_id)
|
| 173 |
-
st.success(result_text)
|
| 174 |
-
tts_text = get_tts_message(result_text)
|
| 175 |
-
st.info(f"TTS 語音內容: {tts_text}")
|
| 176 |
-
audio_message = play_tts_message(tts_text)
|
| 177 |
-
play_audio_auto(audio_message, mime="audio/wav")
|
| 178 |
-
|
| 179 |
-
# Process text input for intent classification
|
| 180 |
-
text_input = st.text_input("✍️ 或手動輸入語句")
|
| 181 |
-
if text_input and st.button("🚀 送出"):
|
| 182 |
-
with st.spinner("預測中..."):
|
| 183 |
-
if model_id == "qwen":
|
| 184 |
-
result_text = predict_with_qwen(text_input)
|
| 185 |
else:
|
| 186 |
-
|
| 187 |
-
|
| 188 |
-
|
| 189 |
-
|
| 190 |
-
|
| 191 |
-
|
| 192 |
-
|
| 193 |
-
with
|
| 194 |
-
|
| 195 |
-
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
import gradio as gr
|
| 2 |
+
from transformers import pipeline, AutoTokenizer, AutoModelForSequenceClassification
|
|
|
|
|
|
|
|
|
|
| 3 |
import torch
|
| 4 |
+
from torch.nn.functional import softmax
|
| 5 |
+
import numpy as np
|
| 6 |
+
import soundfile as sf
|
| 7 |
+
import io
|
| 8 |
import tempfile
|
| 9 |
+
import outlines # For Qwen integration via outlines
|
| 10 |
+
import kokoro # For TTS synthesis
|
| 11 |
import re
|
| 12 |
from pathlib import Path
|
| 13 |
+
from functools import lru_cache
|
| 14 |
+
import warnings
|
|
|
|
|
|
|
|
|
|
| 15 |
|
| 16 |
+
# Suppress FutureWarnings (e.g. about using `inputs` vs. `input_features`)
|
| 17 |
+
warnings.filterwarnings("ignore", category=FutureWarning)
|
| 18 |
|
| 19 |
+
# ------------------- Model Identifiers -------------------
|
| 20 |
whisper_model_id = "Jingmiao/whisper-small-zh_tw"
|
|
|
|
|
|
|
| 21 |
qwen_model_id = "Qwen/Qwen2.5-0.5B-Instruct"
|
| 22 |
|
|
|
|
| 23 |
available_models = {
|
| 24 |
"ALBERT-tiny (Chinese)": "Luigi/albert-tiny-chinese-dinercall-intent",
|
| 25 |
"ALBERT-base (Chinese)": "Luigi/albert-base-chinese-dinercall-intent",
|
| 26 |
+
"Qwen (via Transformers - outlines)": "qwen"
|
| 27 |
}
|
| 28 |
|
| 29 |
+
# ------------------- Caching and Loading Functions -------------------
|
| 30 |
+
@lru_cache(maxsize=1)
|
|
|
|
| 31 |
def load_whisper_pipeline():
|
| 32 |
+
pipe = pipeline("automatic-speech-recognition", model=whisper_model_id)
|
| 33 |
+
# Move model to GPU if available for faster inference
|
| 34 |
+
if torch.cuda.is_available():
|
| 35 |
+
pipe.model.to("cuda")
|
| 36 |
+
return pipe
|
| 37 |
+
|
| 38 |
+
@lru_cache(maxsize=2)
|
| 39 |
+
def load_transformers_model(model_id: str):
|
| 40 |
tokenizer = AutoTokenizer.from_pretrained(model_id, use_fast=True)
|
| 41 |
model = AutoModelForSequenceClassification.from_pretrained(model_id)
|
| 42 |
+
if torch.cuda.is_available():
|
| 43 |
+
model.to("cuda")
|
| 44 |
return tokenizer, model
|
| 45 |
|
| 46 |
+
@lru_cache(maxsize=1)
|
| 47 |
def load_qwen_model():
|
|
|
|
|
|
|
| 48 |
return outlines.models.transformers(qwen_model_id)
|
| 49 |
|
| 50 |
+
@lru_cache(maxsize=1)
|
| 51 |
+
def get_tts_pipeline():
|
| 52 |
+
return kokoro.KPipeline(lang_code="z")
|
| 53 |
|
| 54 |
+
# ------------------- Inference Functions -------------------
|
| 55 |
+
def predict_with_qwen(text: str):
|
| 56 |
model = load_qwen_model()
|
| 57 |
prompt = f"""
|
| 58 |
<|im_start|>system
|
|
|
|
| 78 |
else:
|
| 79 |
return f"未知回應: {prediction}"
|
| 80 |
|
| 81 |
+
def predict_intent(text: str, model_id: str):
|
|
|
|
| 82 |
tokenizer, model = load_transformers_model(model_id)
|
| 83 |
inputs = tokenizer(text, return_tensors="pt")
|
| 84 |
+
if torch.cuda.is_available():
|
| 85 |
+
inputs = {k: v.to("cuda") for k, v in inputs.items()}
|
| 86 |
with torch.no_grad():
|
| 87 |
logits = model(**inputs).logits
|
| 88 |
probs = softmax(logits, dim=-1)
|
|
|
|
| 92 |
else:
|
| 93 |
return f"❌ 無訂位意圖 (Not Reservation intent)(訂位信心度 Confidence: {confidence:.2%})"
|
| 94 |
|
| 95 |
+
def get_tts_message(intent_result: str):
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 96 |
if intent_result and "訂位意圖" in intent_result and "無" not in intent_result:
|
| 97 |
return "稍後您將會從簡訊收到訂位連結"
|
| 98 |
elif intent_result:
|
|
|
|
| 100 |
else:
|
| 101 |
return "未能判斷意圖"
|
| 102 |
|
| 103 |
+
def tts_audio_output(message: str, voice: str = 'af_heart'):
|
| 104 |
pipeline_tts = get_tts_pipeline()
|
| 105 |
generator = pipeline_tts(message, voice=voice)
|
| 106 |
audio_chunks = []
|
|
|
|
| 108 |
audio_chunks.append(audio)
|
| 109 |
if audio_chunks:
|
| 110 |
audio_concat = np.concatenate(audio_chunks)
|
| 111 |
+
# Return as tuple (sample_rate, numpy_array) for gr.Audio (sample rate used: 24000 Hz)
|
| 112 |
+
return (24000, audio_concat)
|
| 113 |
+
else:
|
| 114 |
+
return None
|
| 115 |
+
|
| 116 |
+
def transcribe_audio(audio_file):
|
| 117 |
+
whisper_pipe = load_whisper_pipeline()
|
| 118 |
+
# audio_file is the file path from gr.Audio (with type="filepath")
|
| 119 |
+
result = whisper_pipe(audio_file)
|
| 120 |
+
return result["text"]
|
| 121 |
+
|
| 122 |
+
# ------------------- Main Processing Function -------------------
|
| 123 |
+
def classify_intent(mode, audio_file, text_input, model_choice):
|
| 124 |
+
# Determine input based on explicit mode.
|
| 125 |
+
if mode == "Microphone" and audio_file is not None:
|
| 126 |
+
transcription = transcribe_audio(audio_file)
|
| 127 |
+
elif mode == "Text" and text_input:
|
| 128 |
+
transcription = text_input
|
| 129 |
+
else:
|
| 130 |
+
return "請提供語音或文字輸入", "", None
|
| 131 |
+
|
| 132 |
+
# Classify the transcribed or provided text.
|
| 133 |
+
if available_models[model_choice] == "qwen":
|
| 134 |
+
classification = predict_with_qwen(transcription)
|
| 135 |
else:
|
| 136 |
+
classification = predict_intent(transcription, available_models[model_choice])
|
| 137 |
+
# Generate TTS message and audio.
|
| 138 |
+
tts_msg = get_tts_message(classification)
|
| 139 |
+
tts_audio = tts_audio_output(tts_msg)
|
| 140 |
+
return transcription, classification, tts_audio
|
| 141 |
+
|
| 142 |
+
# ------------------- Gradio Blocks Interface Setup -------------------
|
| 143 |
+
with gr.Blocks() as demo:
|
| 144 |
+
gr.Markdown("## 🍽️ 餐廳訂位意圖識別")
|
| 145 |
+
gr.Markdown("錄音或輸入文字,自動判斷是否具有訂位意圖。")
|
| 146 |
+
|
| 147 |
+
with gr.Row():
|
| 148 |
+
# Input Mode Selector
|
| 149 |
+
mode = gr.Radio(choices=["Microphone", "Text"], label="選擇輸入模式", value="Microphone")
|
| 150 |
+
|
| 151 |
+
with gr.Row():
|
| 152 |
+
# Audio and Text inputs – only one will be visible based on mode selection.
|
| 153 |
+
audio_input = gr.Audio(sources=["microphone"], type="filepath", label="語音輸入 (點擊錄音)")
|
| 154 |
+
text_input = gr.Textbox(lines=2, placeholder="請輸入文字", label="文字輸入")
|
| 155 |
+
|
| 156 |
+
# Initially, only the microphone input is visible.
|
| 157 |
+
text_input.visible = False
|
| 158 |
+
|
| 159 |
+
# Change event for mode selection to toggle visibility.
|
| 160 |
+
def update_visibility(selected_mode):
|
| 161 |
+
if selected_mode == "Microphone":
|
| 162 |
+
return gr.update(visible=True), gr.update(visible=False)
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 163 |
else:
|
| 164 |
+
return gr.update(visible=False), gr.update(visible=True)
|
| 165 |
+
mode.change(fn=update_visibility, inputs=mode, outputs=[audio_input, text_input])
|
| 166 |
+
|
| 167 |
+
with gr.Row():
|
| 168 |
+
model_dropdown = gr.Dropdown(choices=list(available_models.keys()),
|
| 169 |
+
value="ALBERT-tiny (Chinese)", label="選擇模型")
|
| 170 |
+
|
| 171 |
+
with gr.Row():
|
| 172 |
+
classify_btn = gr.Button("執行辨識")
|
| 173 |
+
|
| 174 |
+
with gr.Row():
|
| 175 |
+
transcription_output = gr.Textbox(label="轉換文字")
|
| 176 |
+
with gr.Row():
|
| 177 |
+
classification_output = gr.Textbox(label="意圖判斷結果")
|
| 178 |
+
with gr.Row():
|
| 179 |
+
tts_output = gr.Audio(type="numpy", label="TTS 語音輸出")
|
| 180 |
+
|
| 181 |
+
# Button event triggers the classification. Gradio will show a spinner during processing.
|
| 182 |
+
classify_btn.click(fn=classify_intent,
|
| 183 |
+
inputs=[mode, audio_input, text_input, model_dropdown],
|
| 184 |
+
outputs=[transcription_output, classification_output, tts_output])
|
| 185 |
+
|
| 186 |
+
demo.launch()
|
requirements.txt
CHANGED
|
@@ -3,11 +3,9 @@
|
|
| 3 |
--extra-index-url https://abetlen.github.io/llama-cpp-python/whl/cpu
|
| 4 |
|
| 5 |
llama-cpp-python
|
| 6 |
-
|
| 7 |
-
streamlit-mic-recorder
|
| 8 |
transformers
|
| 9 |
torch
|
| 10 |
-
faster-whisper
|
| 11 |
soundfile
|
| 12 |
outlines
|
| 13 |
numpy>=1.24,<2.0
|
|
|
|
| 3 |
--extra-index-url https://abetlen.github.io/llama-cpp-python/whl/cpu
|
| 4 |
|
| 5 |
llama-cpp-python
|
| 6 |
+
gradio>=5.0.0
|
|
|
|
| 7 |
transformers
|
| 8 |
torch
|
|
|
|
| 9 |
soundfile
|
| 10 |
outlines
|
| 11 |
numpy>=1.24,<2.0
|