WJ88 commited on
Commit
6331162
·
verified ·
1 Parent(s): 97d8556

Update app.py

Browse files
Files changed (1) hide show
  1. app.py +11 -22
app.py CHANGED
@@ -394,38 +394,27 @@ def offline_run(files, batch_size: int, want_ts: bool):
394
  with gr.Blocks(title="Parakeet-TDT v3: Streaming (Mic) + Offline (File)") as demo:
395
  gr.Markdown(
396
  """
397
- # FINALLY, SIMPLE EXPLANATION OF THE NVIDIA NEMO TECHNICALS! (tl;dr scroll down for kinda working realtime transcript and file upload and transcript)
398
- app itself is "not great not terrible" but if you have "the WILL" you will copy paste,
399
- the code into AI of your choice and develop from there, good luck! (same goes for me :P)
400
 
401
- This is a CHAD (smol/basic) version of the idea of local transcription in real-time
402
- on cheap hardware, for example to use with rapsberry pi locally to "do this, do that"
403
- (you can easily use another SMOL LLM to in real-time take an action when you say "open the door",
404
- or any other stuff that you can imagine, its a modern programming - whre the steps are defined by language
405
- and the context - no more silly programming language syntax :D)
406
 
407
- # Beam ASR = Google Maps for Speech - THE ANALOGY TO THE NVIDIA TECHNICAL TERMS USED IN THIS APP
408
- NOTE: this app is not using lame chunking of audio like others... We are fully compatible here with modern
409
- architecture of Parakeet-TDT-v3 model, its no joke anymore - soon, the "cache- streaming" will be
410
- implemented - another gamechanger, anyway - THE WORK IS ONGOING ON THIS APP, FOR TODAY HERE YOU HAVE SMOL CHAD COMMIT XD
411
 
412
  **One-route vs many**
413
  - Greedy: pick the first route and drive.
414
  - Beam: keep several good routes, update with traffic, follow the best.
 
415
  **Beam size**
416
  - How many alternate routes you watch at once.
 
417
  **Label-looping**
418
  - Make back-to-back turns at the same intersection when signs are clear (e.g., "right, then immediate merge"). Faster, fewer stutters.
419
- **Fair scoring (length-norm)**
420
- - Compare routes by ETA, not just distance. A longer route does not win just for having more segments.
421
- **Blank / stability**
422
- - The "wait" option at an intersection. If signals are unclear, pause instead of taking a wrong turn.
423
- **Streaming (mic)**
424
- - You only see a few blocks ahead (chunk + right-context). The planner keeps alternates and snaps overlapping segments so the path does not repeat or skip.
425
- **File (batch)**
426
- - Whole trip is visible. Same multi-route logic in one pass. No stitching needed.
427
- **Trade-off**
428
- - More routes watched -> better choices, higher planning cost. Beam 8 is a solid balance.
429
  """
430
  )
431
 
 
394
  with gr.Blocks(title="Parakeet-TDT v3: Streaming (Mic) + Offline (File)") as demo:
395
  gr.Markdown(
396
  """
397
+ # FINALLY, SIMPLE EXPLANATION OF THE NVIDIA NEMO TECHNICALS!
 
 
398
 
399
+ This is a CHAD (smol one, not a GIGA one/basic) version of the idea of local transcription in real-time
400
+ on cheap hardware, for example to use with rapsberry pi locally to say - do this, do that
401
+ - you can easily use another SMOL LLM to in real-time take an action when you say "open the door
 
 
402
 
403
+ # Beam ASR = Google Maps for Speech - THE ANALOGY OF USED FEATURES TO THE NVIDIA TECHNICAL TERMS USED IN THIS APP
404
+ NOTE this app is not using lame chunking of audio like others... We are fully compatible here with modern
405
+ architecture of Parakeet-TDT-v3 model, its no joke anymore - soon, the cache-aware streaming will be
406
+ implemented - another gamechanger, anyway - THE WORK IS ONGOING ON THIS APP, FOR TODAY HERE YOU HAVE SMOL CHAD COMMIT
407
 
408
  **One-route vs many**
409
  - Greedy: pick the first route and drive.
410
  - Beam: keep several good routes, update with traffic, follow the best.
411
+
412
  **Beam size**
413
  - How many alternate routes you watch at once.
414
+
415
  **Label-looping**
416
  - Make back-to-back turns at the same intersection when signs are clear (e.g., "right, then immediate merge"). Faster, fewer stutters.
417
+
 
 
 
 
 
 
 
 
 
418
  """
419
  )
420