Multi-modal teaching workflow combining visual (Excalidraw) and audio (Gemini TTS) learning with Claude Code.
- Korean Language Instruction: Fun, playful Korean teacher persona
- Visual Learning: Automatic diagram generation using Excalidraw MCP
- Audio Learning: Korean text-to-speech using Gemini API
- Synchronized Output: Visual and audio content generated in parallel
- Session Transcripts: Automatic recording of all teaching sessions
- Chunked Delivery: 3-5 sentence chunks for better comprehension
┌─────────────────────────────────────────────────────────────┐
│ Claude (Main Teacher) │
│ Fun, playful Korean instructor persona │
│ Chunks: 3-5 sentences at a time │
└─────────────────┬───────────────────────┬───────────────────┘
│ │
▼ ▼
┌─────────────────────────┐ ┌─────────────────────────┐
│ Drawing Subagent │ │ Speaking Subagent │
│ (Excalidraw MCP) │ │ (TTS Express Server) │
│ - Visualize concepts │ │ - Korean audio │
│ - Append drawings │ │ - Stream to browser │
│ - Color-coded │ │ - Gemini API │
└─────────────────────────┘ └─────────────────────────┘
- Node.js 18+
- Claude Code CLI
- Excalidraw MCP server (already configured in
.mcp.json) - Gemini API key (optional, required for TTS audio) (Get one here)
-
Configure API Key (optional):
cp .env.example .env # Edit .env and add your GEMINI_API_KEYIf you skip this step, the system still starts but TTS audio will be disabled.
-
Install Dependencies:
cd server npm install -
Make Scripts Executable:
chmod +x scripts/*.sh
./scripts/start-all.shThis will:
- Start Excalidraw Canvas on port 3333
- Start TTS Server on port 3334
- Install dependencies if needed
- Create necessary directories
- If GEMINI_API_KEY is not set, TTS audio is disabled but the UI still runs
Open your browser to: http://localhost:3334
In Claude Code, run:
/teach
Or simply ask: "가르쳐줘", "설명해줘"
You: "재귀함수가 뭐예요?"
Claude:
- Generates 3-5 Korean sentences explaining recursion
- Creates Excalidraw diagram visualizing the concept
- Sends text to TTS server for audio playback
- Waits for both to complete
- Continues with next chunk
- Records entire session to transcript
teaching-assistant/
├── .mcp.json # Excalidraw MCP config
├── .env.example # API key template
├── .gitignore # Git ignore rules
├── README.md # This file
├── server/
│ ├── package.json # Node.js dependencies
│ ├── index.js # Express TTS server
│ └── public/
│ └── index.html # Audio player interface
├── scripts/
│ ├── start-canvas.sh # Start Excalidraw (existing)
│ └── start-all.sh # Start everything (new)
├── transcripts/ # Session recordings
│ └── markdown-YYYY-MM-DD.md
└── docs/
└── changelog/
└── changelog-2024-12-22.md
Express server that:
- Accepts Korean text via POST
/tts - Calls Gemini TTS API
- Converts base64 PCM to WAV
- Streams audio via WebSocket
- Serves browser audio player
- Emits text streaming events for live UI updates
Endpoints:
GET /- Live session interfacePOST /tts- Text-to-speech conversionGET /health- Server health checkGET /events- Agent event stream (SSE)POST /events- Push agent event updatesPOST /ttsemitstext_deltaandtext_doneevents for streaming textWebSocket- Real-time audio streaming
Defines:
- Korean instructor persona
- 3-5 sentence chunking strategy
- Excalidraw visualization patterns
- TTS coordination
- Transcript recording format
Browser interface with:
- Real-time progress indicators (text, drawing, audio)
- Event stream updates via SSE
- WebSocket audio playback
- Live chat panel unlocked after drawing completion
- Cropped Excalidraw preview rendered from
/api/elements - Preview zoom controls with mouse wheel support
- Preview pan with click-and-drag
Excalidraw diagrams use consistent colors:
- Purple (#667eea): Main concepts
- Green (#51cf66): Examples/details
- Red (#ff6b6b): Important/warnings
- Orange (#ffa94d): Connections/relationships
All sessions are recorded to:
transcripts/markdown-YYYY-MM-DD.md
Format:
# Teaching Session - 2024-12-22
## Q: [User question]
## A (Chunk 1):
[Korean response]
---
## A (Chunk 2):
[Korean response]-
Check if port 3334 is available:
lsof -ti:3334
-
Check server logs:
tail -f tts-server.log
-
If you need audio, verify GEMINI_API_KEY is set:
grep GEMINI_API_KEY .env
-
Check if port 3333 is available:
lsof -ti:3333
-
Verify MCP server is running:
screen -r excalidraw-canvas
- Ensure browser is connected to WebSocket
- If the UI shows TTS disabled, set GEMINI_API_KEY and restart
- Check browser console for errors
- Verify Gemini API key is valid
- Check server logs for TTS errors
View running servers:
screen -lsAttach to sessions:
screen -r excalidraw-canvas # Excalidraw Canvas
screen -r tts-server # TTS ServerDetach: Ctrl-A, then D
killall screenOr stop individually:
screen -S excalidraw-canvas -X quit
screen -S tts-server -X quitRequirement: GEMINI_API_KEY must be set to enable audio. Without it, the server runs but returns 503 for TTS requests.
Endpoint:
POST https://generativelanguage.googleapis.com/v1beta/models/gemini-2.5-flash-preview-tts:generateContent
Headers:
x-goog-api-key: YOUR_API_KEY
Content-Type: application/json
Body:
{
"contents": [{
"parts": [{ "text": "Korean text here" }]
}],
"generationConfig": {
"responseModalities": ["AUDIO"],
"speechConfig": {
"voiceConfig": {
"prebuiltVoiceConfig": {
"voiceName": "Kore"
}
}
}
}
}Response: Base64 PCM audio (24kHz, mono, 16-bit)
This is a personal teaching assistant project. Feel free to fork and customize.
MIT