Building Decktation: Voice Dictation Plugin for Steam Deck

Introduction
I've been enjoying my SteamDeck for a while now. Originally I purchased it to use as my computer on the go. I tried to make Android and Termux be the ultra portable computer. And I'd have to write about it in a separate post. But in the end the convenience of having an x86 computer beats ARM. With the SteamDeck paired with XReal One glasses, on the plane or on the train, I have access to a proper x86 Linux environment (identical to my home computer).
After getting the Deck slowly but surely I setup games and tried different versions of World of Warcraft. I played on Turtle Wow for a while, then tried Ascension WoW, and finally went back to Retail.
One of the main pain points of MMOs on SteamDeck is communication. Not even nuanced and detailed conversations, the simple greetings, gg, ty, stuff like that. SteamDeck felt very clunky in the communication department and in-game chat.
I tried SteamDecks on screen keyboard, both using the touch screen as well as trackpad typing. As niche of an idea as this on screen keyboard is, its too slow and gets in the way too much to be practical when playing World of Warcraft.

Console Port, the addon that streamlines the use of controllers in wow also has an on screen keyboard. Unfortunately this keyboard also falls short in terms of practicality and typing speed for in-game chat.

Automated Speech Recognition for SteamDeck
SteamDeck comes with a microphone, so naturally I thought of Speech-to-text. If you google speech to text on SteamDeck a Reddit thread comes up that explains a setup with a DeckyLoader plugin and a VOSK voice to text model.
Unfortunately decky-dictation's last commit dates back to 2 years ago and the setup process is not smooth.
Enter Decktation
Decktation is a DeckyLoader plugin that brings voice-to-text to SteamDeck using FasterWhisper model. The plugin got it's name from a fellow Redditor TearyEyeBurningFace: https://www.reddit.com/r/SteamDeck/comments/wuajgy/comment/ilaja9g/. It runs entirely on-device with low latency, and uses game-specific context to accurately transcribe game-specific terminology. Push-to-talk via controller buttons, automatic chat channel detection ("/party", "/raid"), and support for WoW's game-specific vocabulary make in-game communication practical on Steam Deck.
Installation
Prerequisites: DeckyLoader must be installed on your Steam Deck.
- Download
decktation.zipfrom the GitHub Releases page - Extract and copy the plugin to your DeckyLoader plugins directory:
unzip decktation.zip sudo cp -r decktation /home/deck/homebrew/plugins/ - Run the bundled setup script once to configure the keyboard simulation service:
sudo /home/deck/homebrew/plugins/decktation/setup_ydotoold.sh - Restart DeckyLoader (or use "Reload Plugins" in Decky settings)
- On first launch, Decktation will automatically download the Whisper model from HuggingFace — this may take a few minutes
The release ZIP comes with all Python dependencies pre-bundled, so no additional package installation is needed.
WoW Companion Addon (Optional)
For World of Warcraft players, installing the DecktationContext WoW addon is recommended. It feeds live game context (your zone, target, party members, spec) into Whisper on every transcription, improving accuracy for WoW-specific terminology like spell names, locations, and player names.
Context and game-specific lingo
Games such as wow have their own lingo. Most of the communication involves specific names such as your characters race (Orc or Night Elf), various spells (Mortal Strike, Bloodlust), location names (Orgrimmar, Dornogal), and player names, all of which are quite hard to get right using Automatic Speech Recognition (ASR) as they are not the words that you find in a dictionary on used in day to day English conversations.
Initially I considered adding these game-specific terms to VOSK. For ground truth, I thought of using existing Warcraft audio books. But assembling & cleaning all that data and adding custom terms to VOSK seemed like a convoluted task (https://alphacephei.com/vosk/lm).
So my research continued for a better alternative. Eventually I landed on OpenAI's Whisper model.
Whisper is quite flexible since it takes an initial prompt: (https://developers.openai.com/cookbook/examples/whisper_prompting_guide/). I use this prompt to provide context about the game and add custom terms and fix common ASR mistakes. I iteratively update this prompt whenever I run into transcription errors:
base_prompt = (
"World of Warcraft gameplay discussion. "
"Playing as orc warrior, tauren druid, blood elf paladin, undead warlock, troll shaman, or night elf hunter. "
"Discussing enhancement shaman, restoration druid, protection warrior, holy paladin, arcane mage, shadow priest, affliction warlock. "
"Running mythic dungeons, heroic raids, doing quests in Azeroth, Orgrimmar, Stormwind, Ironforge. "
"Fighting bosses like Lich King, Ragnaros, Illidan, pulling trash mobs, need tank healer and DPS. "
"Using abilities, cooldowns, buffs, debuffs, interrupts, dispels, cleave and AOE damage. "
"Chat channel prefixes: say, party, raid, guild, officer, yell, instance, whisper, type. "
"Common short phrases: hi, gg, brb, afk, lol, omw, ty, np, wp, gz.")Using FasterWhisper allows me to run inference locally on SteamDeck's CPU with surprisingly low latency.
To provide a dynamic context about the game for Decktation, I built a WoW companion plugin named Decktation Context that queries the game every 2 seconds and extracts the following information:
- Zone name
- Target
- Name of the players in the party or raid
- Player's class and specialization
This information is written to a json file that is picked up and added to the initial prompt of Whisper on every invocation.
Technical Architecture
Decktation uses DeckyLoader as its execution environment and consists of the following modules:
Frontend: Decktation's frontend is built using TypeScript and React. Settings such as button prefix for push to talk, notification settings, and game specific chat settings are configured in the decky plugin UI.
Backend: The heavy lifting is done via a Python-based backend service that spawns and manages the controller listener to hook into button presses. It uses Evdev to listen for button presses and controller events https://python-evdev.readthedocs.io/. Configurable button combinations include L1, R1, L2, R2, A, B, X, Y. Unfortunately the back paddles couldn't be intercepted by evdev. SteamDeck exposes the controller buttons via different input devices, so it's possible that I didn't find the correct device to listen to for L5 and R5 to work. Similarly, it seems like the steam button is intercepted by Steam OS itself as I wasn't able to listen to combinations starting with the Steam button for push to talk.
Voice processing: Decktation uses Faster-whisper's base model for a balance between accuracy and performance. I did not run into resource contention issues when running m+ and running decktation. That said, we have the option to use a smaller or larger models for faster turnaround time or better accuracy.
Keyboard simulation: Last piece of this puzzle is typing the transcription back into games' chat. Decktation uses ydotool for Wayland and Gamescope compatibility. GameScope is a tool from Valve that allows for games to run in an isolated Xwayland instance and supports AMD, Intel, and Nvidia GPUs. Ydotool requires ydotoold to run as a systemd service which is setup during the installation of our plugin.
I also included game-specific configurations in the plugin to define the set of buttons that need to be pressed to open the chat and send it. This includes different channels such as party chat, raid chat, etc. which can be identified when parsing the transcription. For instance saying "party ..." will send the chat to /p channel. Prefixing the message with "type ..." will only send the key strokes to whichever element that is focused in the UI. I use it to type to text boxes in the game. In the demo video I enter the filter for M+ keystone levels using the type command. Alternatively the type command can be used to type in the URL for websites in the browser.
Key Design Decisions and Challenges
Most of the code for this plugin is written using Claude code. The biggest hurdles were understanding Decky and Steam APIs and capturing controller inputs.
Controller Input Architecture: Using Steam's frontend APIs in Decky to listen to key presses only works in the Decky UI. Therefore it's not a feasible option to listen to button presses in the games. That's why I opted for evdev.
State Synchronization: I also ran into issues between the listener and the frontend. At times they can get out of sync especially for streaming-type events such as starting the recording, capturing the transcript, showing a preview to the user to cancel or proceed with the transcription.
Toast Notification Handling: Handling toast notifications also posed some challenges. These notifications can get queued and delayed. The API to control the sound (and to make them silent) does not work. Its either due to an implementation bug in Decky or changes in Steam OS's APIs. So I had to try different hacks to be able to dismiss a toast notification if another message is in the queue and to make them silent.
Distribution and Dependencies: Ensuring a smooth setup experience for the Decktation plugin was also one of my main goals. On my own SteamDeck, I use nix to install packages in my home directory. This way I am not touching the read-only file system of the main OS and my packages won't be affected by system updates.
For the plugin, ideally we want a portable setup. The main body of the plugin is written in Python. And we have the OS's Python, my nix Python, and the version that is bundled with Decky Loader. This led to many inconsistencies as they can be three different versions with their own set of dependencies.
My solution to unify the Python execution environment and ship the plugin with its dependencies was to bundle the same version of Python that Decky currently uses (3.11) into Decktation under the lib directory. I build all the dependencies in Github's CI with Python 3.11. This way the only third party dependencies that need to be installed are system packages such as ydotool as well as the Whisper model itself that is downloaded from the HuggingFace the first time you launch the plugin.
Decktation performs 100% of its processing on device. And no telemetry or data collection takes place.Transcription Preview: While Whisper performs very well with full length sentences, at times the model gets confused with short text and incomplete sentences. For instance, instead of saying "Good Game"", it types "Fruit Game". It can be partially blamed on my accent, but to prevent such embarrassing and hilarious moments I added an option to preview the transcript before sending it. This way you can press the push-to-talk button again to cancel sending the text.
Future Ideas
I have a shortlist of ideas for this plugin and I'd be happy to hear more about how other deckers use voice to text on their SteamDeck and how Decktation can help them.
Per-Game Configuration
- Auto-detect running game
- Game-specific profiles (WoW, FFXIV, generic FPS)
- Configurable chat activation keys per game
- Built-in presets for popular games
Enhanced Context
- Automatic context extraction from game memory or via plugins
- Learning from user corrections
- Shared context databases for common games