Update app.py
Browse files
app.py
CHANGED
|
@@ -394,38 +394,27 @@ def offline_run(files, batch_size: int, want_ts: bool):
|
|
| 394 |
with gr.Blocks(title="Parakeet-TDT v3: Streaming (Mic) + Offline (File)") as demo:
|
| 395 |
gr.Markdown(
|
| 396 |
"""
|
| 397 |
-
# FINALLY, SIMPLE EXPLANATION OF THE NVIDIA NEMO TECHNICALS!
|
| 398 |
-
app itself is "not great not terrible" but if you have "the WILL" you will copy paste,
|
| 399 |
-
the code into AI of your choice and develop from there, good luck! (same goes for me :P)
|
| 400 |
|
| 401 |
-
This is a CHAD (smol/basic) version of the idea of local transcription in real-time
|
| 402 |
-
on cheap hardware, for example to use with rapsberry pi locally to
|
| 403 |
-
|
| 404 |
-
or any other stuff that you can imagine, its a modern programming - whre the steps are defined by language
|
| 405 |
-
and the context - no more silly programming language syntax :D)
|
| 406 |
|
| 407 |
-
# Beam ASR = Google Maps for Speech - THE ANALOGY TO THE NVIDIA TECHNICAL TERMS USED IN THIS APP
|
| 408 |
-
NOTE
|
| 409 |
-
architecture of Parakeet-TDT-v3 model, its no joke anymore - soon, the
|
| 410 |
-
implemented - another gamechanger, anyway - THE WORK IS ONGOING ON THIS APP, FOR TODAY HERE YOU HAVE SMOL CHAD COMMIT
|
| 411 |
|
| 412 |
**One-route vs many**
|
| 413 |
- Greedy: pick the first route and drive.
|
| 414 |
- Beam: keep several good routes, update with traffic, follow the best.
|
|
|
|
| 415 |
**Beam size**
|
| 416 |
- How many alternate routes you watch at once.
|
|
|
|
| 417 |
**Label-looping**
|
| 418 |
- Make back-to-back turns at the same intersection when signs are clear (e.g., "right, then immediate merge"). Faster, fewer stutters.
|
| 419 |
-
|
| 420 |
-
- Compare routes by ETA, not just distance. A longer route does not win just for having more segments.
|
| 421 |
-
**Blank / stability**
|
| 422 |
-
- The "wait" option at an intersection. If signals are unclear, pause instead of taking a wrong turn.
|
| 423 |
-
**Streaming (mic)**
|
| 424 |
-
- You only see a few blocks ahead (chunk + right-context). The planner keeps alternates and snaps overlapping segments so the path does not repeat or skip.
|
| 425 |
-
**File (batch)**
|
| 426 |
-
- Whole trip is visible. Same multi-route logic in one pass. No stitching needed.
|
| 427 |
-
**Trade-off**
|
| 428 |
-
- More routes watched -> better choices, higher planning cost. Beam 8 is a solid balance.
|
| 429 |
"""
|
| 430 |
)
|
| 431 |
|
|
|
|
| 394 |
with gr.Blocks(title="Parakeet-TDT v3: Streaming (Mic) + Offline (File)") as demo:
|
| 395 |
gr.Markdown(
|
| 396 |
"""
|
| 397 |
+
# FINALLY, SIMPLE EXPLANATION OF THE NVIDIA NEMO TECHNICALS!
|
|
|
|
|
|
|
| 398 |
|
| 399 |
+
This is a CHAD (smol one, not a GIGA one/basic) version of the idea of local transcription in real-time
|
| 400 |
+
on cheap hardware, for example to use with rapsberry pi locally to say - do this, do that
|
| 401 |
+
- you can easily use another SMOL LLM to in real-time take an action when you say "open the door
|
|
|
|
|
|
|
| 402 |
|
| 403 |
+
# Beam ASR = Google Maps for Speech - THE ANALOGY OF USED FEATURES TO THE NVIDIA TECHNICAL TERMS USED IN THIS APP
|
| 404 |
+
NOTE this app is not using lame chunking of audio like others... We are fully compatible here with modern
|
| 405 |
+
architecture of Parakeet-TDT-v3 model, its no joke anymore - soon, the cache-aware streaming will be
|
| 406 |
+
implemented - another gamechanger, anyway - THE WORK IS ONGOING ON THIS APP, FOR TODAY HERE YOU HAVE SMOL CHAD COMMIT
|
| 407 |
|
| 408 |
**One-route vs many**
|
| 409 |
- Greedy: pick the first route and drive.
|
| 410 |
- Beam: keep several good routes, update with traffic, follow the best.
|
| 411 |
+
|
| 412 |
**Beam size**
|
| 413 |
- How many alternate routes you watch at once.
|
| 414 |
+
|
| 415 |
**Label-looping**
|
| 416 |
- Make back-to-back turns at the same intersection when signs are clear (e.g., "right, then immediate merge"). Faster, fewer stutters.
|
| 417 |
+
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 418 |
"""
|
| 419 |
)
|
| 420 |
|