Skip to main content
Computer use agents interact with graphical desktops the same way a human would — viewing the screen, clicking, typing, and scrolling. E2B provides the sandboxed desktop environment where these agents operate safely, with VNC streaming for real-time visual feedback. For a complete working implementation, see E2B Surf — an open-source computer use agent you can try via the live demo.

How it works

The computer use agent loop follows this pattern:
  1. User sends a command — e.g., “Open Firefox and search for AI news”
  2. Agent creates a desktop sandbox — an Ubuntu 22.04 environment with XFCE desktop and pre-installed applications
  3. Agent takes a screenshot — captures the current desktop state via E2B Desktop SDK
  4. LLM analyzes the screenshot — a vision model (e.g., OpenAI Computer Use API) decides what action to take
  5. Action is executed — click, type, scroll, or keypress via E2B Desktop SDK
  6. Repeat — new screenshot is taken and sent back to the LLM until the task is complete

Install the E2B Desktop SDK

The E2B Desktop SDK gives your agent a full Linux desktop with mouse, keyboard, and screen capture APIs.
npm i @e2b/desktop

Core implementation

The following snippets are adapted from E2B Surf.

Setting up the sandbox

Create a desktop sandbox and start VNC streaming so you can view the desktop in a browser.
import { Sandbox } from '@e2b/desktop'

// Create a desktop sandbox with a 5-minute timeout
const sandbox = await Sandbox.create({
  resolution: [1024, 720],
  dpi: 96,
  timeoutMs: 300_000,
})

// Start VNC streaming for browser-based viewing
await sandbox.stream.start()
const streamUrl = sandbox.stream.getUrl()
console.log('View desktop at:', streamUrl)

Executing desktop actions

The E2B Desktop SDK maps directly to mouse and keyboard actions. Here’s how Surf translates LLM-returned actions into desktop interactions.
import { Sandbox } from '@e2b/desktop'

const sandbox = await Sandbox.create({ timeoutMs: 300_000 })

// Mouse actions
await sandbox.leftClick(500, 300)
await sandbox.rightClick(500, 300)
await sandbox.doubleClick(500, 300)
await sandbox.middleClick(500, 300)
await sandbox.moveMouse(500, 300)
await sandbox.drag([100, 200], [400, 500])

// Keyboard actions
await sandbox.write('Hello, world!')  // Type text
await sandbox.press('Enter')          // Press a key

// Scrolling
await sandbox.scroll('down', 3)  // Scroll down 3 ticks
await sandbox.scroll('up', 3)    // Scroll up 3 ticks

// Screenshots
const screenshot = await sandbox.screenshot()  // Returns Buffer

// Run terminal commands
await sandbox.commands.run('ls -la /home')

Agent loop

The core loop takes screenshots, sends them to an LLM, and executes the returned actions on the desktop. This is a simplified version of how Surf drives the computer use cycle.
import { Sandbox } from '@e2b/desktop'

const sandbox = await Sandbox.create({
  resolution: [1024, 720],
  timeoutMs: 300_000,
})
await sandbox.stream.start()

while (true) {
  // 1. Capture the current desktop state
  const screenshot = await sandbox.screenshot()

  // 2. Send screenshot to your LLM and get the next action
  //    (use OpenAI Computer Use, Anthropic Claude, etc.)
  const action = await getNextActionFromLLM(screenshot)

  if (!action) break // LLM signals task is complete

  // 3. Execute the action on the desktop
  switch (action.type) {
    case 'click':
      await sandbox.leftClick(action.x, action.y)
      break
    case 'type':
      await sandbox.write(action.text)
      break
    case 'keypress':
      await sandbox.press(action.keys)
      break
    case 'scroll':
      await sandbox.scroll(
        action.scrollY < 0 ? 'up' : 'down',
        Math.abs(action.scrollY)
      )
      break
    case 'drag':
      await sandbox.drag(
        [action.startX, action.startY],
        [action.endX, action.endY]
      )
      break
  }
}

await sandbox.kill()
The getNextActionFromLLM / get_next_action_from_llm function is where you integrate your chosen LLM. See Connect LLMs to E2B for integration patterns with OpenAI, Anthropic, and other providers.

Desktop template

Build desktop sandboxes with Ubuntu, XFCE, and VNC streaming

Connect LLMs

Integrate AI models with sandboxes using tool calling

Sandbox lifecycle

Create, manage, and control sandbox lifecycle