How it works
The computer use agent loop follows this pattern:- User sends a command — e.g., “Open Firefox and search for AI news”
- Agent creates a desktop sandbox — an Ubuntu 22.04 environment with XFCE desktop and pre-installed applications
- Agent takes a screenshot — captures the current desktop state via E2B Desktop SDK
- LLM analyzes the screenshot — a vision model (e.g., OpenAI Computer Use API) decides what action to take
- Action is executed — click, type, scroll, or keypress via E2B Desktop SDK
- Repeat — new screenshot is taken and sent back to the LLM until the task is complete
Install the E2B Desktop SDK
The E2B Desktop SDK gives your agent a full Linux desktop with mouse, keyboard, and screen capture APIs.Core implementation
The following snippets are adapted from E2B Surf.Setting up the sandbox
Create a desktop sandbox and start VNC streaming so you can view the desktop in a browser.Executing desktop actions
The E2B Desktop SDK maps directly to mouse and keyboard actions. Here’s how Surf translates LLM-returned actions into desktop interactions.Agent loop
The core loop takes screenshots, sends them to an LLM, and executes the returned actions on the desktop. This is a simplified version of how Surf drives the computer use cycle.getNextActionFromLLM / get_next_action_from_llm function is where you integrate your chosen LLM. See Connect LLMs to E2B for integration patterns with OpenAI, Anthropic, and other providers.
Related guides
Desktop template
Build desktop sandboxes with Ubuntu, XFCE, and VNC streaming
Connect LLMs
Integrate AI models with sandboxes using tool calling
Sandbox lifecycle
Create, manage, and control sandbox lifecycle