pl

playwright-mcp

Official Microsoft Playwright MCP server, enabling LLMs to interact with web pages through structured accessibility snapshots

#Microsoft Playwright# MCP server# web accessibility
Publisherplaywright-mcp
Submitted date4/13/2025

Unleashing LLMs with Playwright MCP: A Deep Dive into Browser Automation

The Playwright Model Context Protocol (MCP) server empowers Large Language Models (LLMs) to seamlessly interact with web environments. By leveraging Playwright's robust browser automation capabilities, this server provides a structured and efficient way for LLMs to access and manipulate web content, opening up a world of possibilities for AI-driven applications.

Core Advantages

  • Efficiency: Operates on the accessibility tree, eliminating the need for computationally expensive vision models and pixel-based analysis.
  • LLM-Optimized: Provides structured data that LLMs can easily interpret and utilize, enabling more accurate and reliable interactions.
  • Deterministic Actions: Ensures precise and predictable tool application, overcoming the ambiguities often associated with screenshot-based approaches.

Use Case Applications

  • Intelligent Web Navigation: Enables LLMs to autonomously navigate websites, fill forms, and perform complex tasks.
  • Automated Data Extraction: Facilitates the extraction of structured data from web pages, streamlining data collection processes.
  • AI-Powered Testing: Empowers LLMs to drive automated testing workflows, improving software quality and efficiency.
  • Versatile Agent Interactions: Provides a foundation for building general-purpose browser agents capable of handling a wide range of web-based tasks.

Configuration Essentials

To integrate Playwright MCP into your workflow, configure your mcpServers settings as follows:

{ "mcpServers": { "playwright": { "command": "npx", "args": [ "@playwright/mcp@latest" ] } } }

Installation Guide for VS Code

Streamlined Installation

Install the Playwright MCP server directly within VS Code using these convenient buttons:

Install in VS Code Install in VS Code Insiders

Command-Line Installation

Alternatively, use the VS Code CLI for installation:

# For VS Code code --add-mcp '{"name":"playwright","command":"npx","args":["@playwright/mcp@latest"]}' # For VS Code Insiders code-insiders --add-mcp '{"name":"playwright","command":"npx","args":["@playwright/mcp@latest"]}'

Command-Line Options: Tailoring Playwright MCP to Your Needs

The Playwright MCP server offers a range of command-line options to customize its behavior:

  • --browser <browser>: Specify the browser or Chrome channel to use (e.g., chrome, firefox, webkit, msedge, chrome-beta, chrome-canary, chrome-dev, msedge-beta, msedge-canary, msedge-dev). Defaults to chrome.
  • --caps <caps>: Enable specific capabilities (e.g., tabs, pdf, history, wait, files, install). Defaults to all capabilities enabled.
  • --cdp-endpoint <endpoint>: Connect to a specific Chrome DevTools Protocol (CDP) endpoint.
  • --executable-path <path>: Specify the path to the browser executable.
  • --headless: Run the browser in headless mode (no GUI). By default, the browser runs in headed mode.
  • --port <port>: Set the port for the Server-Sent Events (SSE) transport.
  • --user-data-dir <path>: Define the path to the user data directory.
  • --vision: Enable vision mode, which uses screenshots for interactions (accessibility snapshots are used by default).

User Data Directory: Managing Browser Profiles

Playwright MCP launches the browser with a dedicated profile stored in the following locations:

  • Windows: %USERPROFILE%\AppData\Local\ms-playwright\mcp-chrome-profile
  • macOS: ~/Library/Caches/ms-playwright/mcp-chrome-profile
  • Linux: ~/.cache/ms-playwright/mcp-chrome-profile

This profile stores all login information and browsing data. You can delete it between sessions to clear the offline state.

Headless Browser Mode: Background Operations

To run the browser without a graphical interface, use the --headless flag:

{ "mcpServers": { "playwright": { "command": "npx", "args": [ "@playwright/mcp@latest", "--headless" ] } } }

Headed Browser on Linux without DISPLAY: Enabling SSE Transport

When running a headed browser on a system without a display or from IDE worker processes, execute the MCP server from an environment with the DISPLAY variable set and pass the --port flag to enable SSE transport:

npx @playwright/mcp@latest --port 8931

Then, configure the MCP client with the SSE endpoint URL:

{ "mcpServers": { "playwright": { "url": "http://localhost:8931/sse" } } }

Tool Modes: Snapshot vs. Vision

Playwright MCP offers two distinct tool modes:

  1. Snapshot Mode (Default): Leverages accessibility snapshots for optimal performance and reliability.
  2. Vision Mode: Employs screenshots for visual-based interactions.

To activate Vision Mode, include the --vision flag when starting the server:

{ "mcpServers": { "playwright": { "command": "npx", "args": [ "@playwright/mcp@latest", "--vision" ] } } }

Vision Mode is particularly effective when used with computer vision models capable of interacting with elements based on X/Y coordinates derived from screenshots.

Programmatic Usage: Custom Transports

For advanced customization, you can use Playwright MCP programmatically with custom transports:

import { createServer } from '@playwright/mcp'; // ... const server = createServer({ launchOptions: { headless: true } }); transport = new SSEServerTransport("/messages", res); server.connect(transport);

API Reference: Snapshot-Based Interactions

browser_click

  • Description: Clicks on a web page element.
  • Parameters:
    • element (string): Human-readable description of the element.
    • ref (string): Exact element reference from the page snapshot.

browser_hover

  • Description: Hovers the mouse over an element.
  • Parameters:
    • element (string): Human-readable description of the element.
    • ref (string): Exact element reference from the page snapshot.

browser_drag

  • Description: Drags and drops an element.
  • Parameters:
    • startElement (string): Human-readable description of the source element.
    • startRef (string): Exact source element reference from the page snapshot.
    • endElement (string): Human-readable description of the target element.
    • endRef (string): Exact target element reference from the page snapshot.

browser_type

  • Description: Types text into an editable element.
  • Parameters:
    • element (string): Human-readable description of the element.
    • ref (string): Exact element reference from the page snapshot.
    • text (string): Text to type.
    • submit (boolean, optional): Whether to submit the text (press Enter).
    • slowly (boolean, optional): Types text one character at a time, useful for triggering key handlers.

browser_select_option

  • Description: Selects an option in a dropdown.
  • Parameters:
    • element (string): Human-readable description of the element.
    • ref (string): Exact element reference from the page snapshot.
    • values (array): Array of values to select.

browser_snapshot

  • Description: Captures an accessibility snapshot of the current page.
  • Parameters: None

browser_take_screenshot

  • Description: Takes a screenshot of the current page.
  • Parameters:
    • raw (boolean, optional): Returns the screenshot in PNG format without compression if true (defaults to JPEG).

API Reference: Vision-Based Interactions

browser_screen_move_mouse

  • Description: Moves the mouse to a specific position.
  • Parameters:
    • element (string): Human-readable description of the element.
    • x (number): X coordinate.
    • y (number): Y coordinate.

browser_screen_capture

  • Description: Takes a screenshot of the current page.
  • Parameters: None

browser_screen_click

  • Description: Clicks the left mouse button.
  • Parameters:
    • element (string): Human-readable description of the element.
    • x (number): X coordinate.
    • y (number): Y coordinate.

browser_screen_drag

  • Description: Drags the left mouse button.
  • Parameters:
    • element (string): Human-readable description of the element.
    • startX (number): Starting X coordinate.
    • startY (number): Starting Y coordinate.
    • endX (number): Ending X coordinate.
    • endY (number): Ending Y coordinate.

browser_screen_type

  • Description: Types text.
  • Parameters:
    • text (string): Text to type.
    • submit (boolean, optional): Whether to submit the text (press Enter).

browser_press_key

  • Description: Presses a key on the keyboard.
  • Parameters:
    • key (string): Name of the key to press (e.g., ArrowLeft, a).

API Reference: Tab Management

browser_tab_list

  • Description: Lists all browser tabs.
  • Parameters: None

browser_tab_new

  • Description: Opens a new tab.
  • Parameters:
    • url (string, optional): The URL to navigate to in the new tab. If not provided, the new tab will be blank.

browser_tab_select

  • Description: Selects a tab by index.
  • Parameters:
    • index (number): The index of the tab to select.

browser_tab_close

  • Description: Closes a tab.
  • Parameters:
    • index (number, optional): The index of the tab to close. Closes the current tab if not provided.

API Reference: Navigation

browser_navigate

  • Description: Navigates to a URL.
  • Parameters:
    • url (string): The URL to navigate to.

browser_navigate_back

  • Description: Navigates back to the previous page.
  • Parameters: None

browser_navigate_forward

  • Description: Navigates forward to the next page.
  • Parameters: None

API Reference: Keyboard

browser_press_key

  • Description: Presses a key on the keyboard.
  • Parameters:
    • key (string): Name of the key to press (e.g., ArrowLeft, a).

API Reference: Files and Media

browser_file_upload

  • Description: Uploads one or more files.
  • Parameters:
    • paths (array): An array of absolute file paths to upload.

browser_pdf_save

  • Description: Saves the page as a PDF.
  • Parameters: None

API Reference: Utilities

browser_wait

  • Description: Waits for a specified time.
  • Parameters:
    • time (number): The time to wait in seconds (capped at 10 seconds).

browser_close

  • Description: Closes the page.
  • Parameters: None

browser_install

  • Description: Installs the browser specified in the config.
  • Parameters: None

Visit More

View All