

Part of the reason that I mentioned the background script context first is to contrast it against the content script context.
Out of the sandbox code#
Things generally appear like the code is executing in the page context, but it isn’t. You can access and modify the tab’s DOM in that context, do any of the normal JavaScript stuff that you know and love, and your context will be destroyed when the tab is closed or navigated to another URL. You specify URL match patterns for your content script code, and then your code will be injected and run in a content script context every time a browser tab navigates to a matching URL. The content script context is a little more complicated. It’s sort of like the control center for your extension where you’ll put code that manages state over time and coordinates code in the other contexts. The background script context is owned completely by your extension and exists as long as your extension is running.

The main contexts that you’ll ever work with are the background script context and the content script context.

You can specify HTML documents and other resources to populate the DOM in those contexts, but that’s mostly tangential to browser automation. To give a quick crash course on WebExtensions, they very roughly consist of a variety of JavaScript HTML page contexts and the code that runs in those contexts.
Out of the sandbox free#
Then we’ll develop code that can be used as a drop-in solution for breaking out of the content script sandbox so that code can be run directly in the context of webpages themselves.Īs always, you can find the finished product in the intoli-article-materials repository on GitHub, so feel free to skip over there if you want to see how all of the pieces fit together in the end.Īnd be sure to star the repo to find out about new articles from the Intoli blog before they’re released! Understanding the Problem In this guide, we’ll start off by explaining a bit about how WebExtensions work, and how content script sandboxing can lead to unexpected and confusing behavior. We did our best to answer those questions, but we figured that writing an explicit guide here would be a good idea since it seems like a common problem.

Recently, a number of people have been emailing us to ask for help when they run into limitations caused by their scripts running in a sandboxed environment.
Out of the sandbox full#
We’ve also frequently recommended WebExtensions in our articles as a way to supplement other common browser automation frameworks.ĭoing so makes it possible to inject JavaScript before page loads, and also allows you to make use of the full privileged WebExtensions API. We’re such big fans of using the WebExtensions API for web scraping here at Intoli that we built a whole web automation framework based on it. They provide an easy way to access an extremely powerful API that’s cross browser compatible out of the box, and that API provides functionality that extends far beyond that of more specialized automation APIs like the Chrome DevTools Protocol or Firefox’s Marionnette.įor example, the WebExtensions API provides a mechanism for containerizing individual tabs–Selenium and Puppeteer can’t do that! WebExtensions are a frequently underappreciated tool for the purposes of web scraping and browser automation.
