draft-ai-search-video-tutorials


MediaJots Build Stack
Software and Tools Used

Software and Tools Used to Build MediaJots

This chapter is a simple, non-technical walkthrough of the main tools inside MediaJots: what each tool does, and why it matters.

Visual Legend
🟦 App shell / interface
🟩 Media & transcript pipeline
🟨 AI generation layer
🟪 Storage & smart search
pywebview (Desktop App Shell)

pywebview

  • What it does: Turns the app into a real desktop window instead of a browser tab.
  • Why MediaJots needs it: You can use MediaJots like a normal desktop app, while still getting a modern interface.
Browser tab UI   ->   🟦 Desktop app window
less app-like         more focused + cleaner experience
HTML, CSS, JavaScript (Frontend UI)

HTML, CSS, JavaScript

  • What it does: Builds everything you click and see: tabs, buttons, forms, status messages, and results.
  • Why MediaJots needs it: MediaJots has many steps, so it needs a clear and responsive interface.
SQLite (Local Persistence)

SQLite

  • What it does: Saves your processed media, transcripts, settings, and search data on your own machine.
  • Why MediaJots needs it: It keeps your history and results ready to reopen quickly, even without depending on a cloud database.
Without local storage:   open app -> start over each time
With SQLite:             open app -> continue where you left off 🟪
requests (Internet Calls)

requests

  • What it does: Lets MediaJots talk to online services (for transcription and AI features).
  • Why MediaJots needs it: Without this connection layer, MediaJots cannot send audio out or get AI responses back.
flowchart LR
    A[MediaJots App] --> B[requests]
    B --> C[AssemblyAI]
    B --> D[OpenRouter]
sequenceDiagram
    participant U as User Action
    participant M as MediaJots
    participant R as requests
    participant S as Online Service
    U->>M: Click Generate / Ask AI
    M->>R: Create API call
    R->>S: Send request
    S-->>R: Return response
    R-->>M: Parsed result
    M-->>U: Show output
yt-dlp (Media Extraction)

yt-dlp

  • What it does: Pulls usable audio/video data from supported links.
  • Why MediaJots needs it: MediaJots must grab the media first before it can create transcripts or summaries.
AssemblyAI (Speech-to-Text)

AssemblyAI

  • What it does: Converts speech from audio into written text.
  • Why MediaJots needs it: The transcript is the base for everything else: summaries, OutScript, and search.
OpenRouter (LLM + Embeddings)

OpenRouter

  • What it does: Connects MediaJots to AI models for summaries, structured output, and smarter search.
  • Why MediaJots needs it: It is what makes the app "AI-powered" instead of just a basic transcript viewer.
flowchart LR
    A[Transcript Text] --> B[OpenRouter]
    B --> C[Summary]
    B --> D[AI Answers]
Pydantic (Schema Validation)

Pydantic

  • What it does: Checks that AI responses follow the expected format.
  • Why MediaJots needs it: AI output can be messy; this keeps results reliable and prevents broken screens.
flowchart LR
    A[OpenRouter Raw Output] --> B[Pydantic Schema Check]
    B --> C{Valid JSON Shape?}
    C -->|Yes| D[Use Result in UI]
    C -->|No| E[Raise Validation Error]
    E --> F[Return Safe Fallback Message]
sqlite-vec (Vector Similarity in SQLite)

sqlite-vec

  • What it does: Adds fast "meaning-based" search over saved transcript chunks.
  • Why MediaJots needs it: It helps AI Search find the most relevant moments quickly, not just exact keyword matches.
Keyword search: "exact word match"
AI search:      "same idea, different words" 🟪
How These Tools Work Together

End-to-End Tool Chain in MediaJots

  1. App window + UI (pywebview + HTML/CSS/JS) lets you paste links, click actions, and review results.
  2. yt-dlp grabs media from supported links.
  3. AssemblyAI turns speech into transcript text.
  4. OpenRouter creates summaries/answers, and Pydantic keeps that output clean.
  5. SQLite + sqlite-vec save everything and power fast History, Keyword Search, and AI Search.

In plain terms: these tools work together so MediaJots can take a link, turn it into readable knowledge, and help you find key moments later.

flowchart TD
    A[Paste URL in MediaJots] --> B[Extract media with yt-dlp]
    B --> C[Transcribe speech with AssemblyAI]
    C --> D[Generate summary and AI answers via OpenRouter]
    D --> E[Validate structured output with Pydantic]
    E --> F[Save and search in SQLite + sqlite-vec]