Recently, I decided to move one of my web scraping scripts from Python over to R because, frankly, I’m much more comfortable in R. Everything ran perfectly on my Windows machine, but the moment I tried it on my Linux server boom i hit the dreaded:
“probably user data directory is already in use specify a unique value for user-data-dir, or don’t use it”
If you’ve been there, you know it’s frustrating. So let me walk you through exactly what I did, why the error happens, and how I fixed it with a more production-ready scraping setup.
My Starting Point
Here’s the code I started with pretty standard selenider
+ rvest
scraping:
library(selenider)
library(rvest)
session <- selenider_session("selenium", browser = "chrome")
Sys.sleep(3)
open_url("https://egamersworld.com/callofduty/matches")
elements <- session |> get_page_source() |> html_elements(".item_teams__cKXQT")
res <- data.frame(
home_team_name = elements |>
html_elements(".item_team__evhUQ:nth-child(1) .item_teamName__NSnfH") |>
html_text(trim = TRUE),
home_team_odds = elements |>
html_elements(".item_team__evhUQ:nth-child(1) .item_odd__Lm2Wl") |>
html_text(trim = TRUE),
away_team_name = elements |>
html_elements(".item_team__evhUQ:nth-child(3) .item_teamName__NSnfH") |>
html_text(trim = TRUE),
away_team_odds = elements |>
html_elements(".item_team__evhUQ:nth-child(3) .item_odd__Lm2Wl") |>
html_text(trim = TRUE),
match_date = elements |>
html_elements(".item_scores__Vi7YX .item_date__g4cq_") |>
html_text(trim = TRUE),
match_time = elements |>
html_elements(".item_scores__Vi7YX .item_time__xBia_") |>
html_text(trim = TRUE),
match_type = elements |>
html_elements(".item_scores__Vi7YX .item_bo__u2C9Q") |>
html_text(trim = TRUE)
)
This happily pulled match data on Windows. On Linux? Not so much.
Why Linux Throw This Error
Here’s the thing: Chrome uses something called the user data directory to store your profile, cookies, cache, and extensions. On Linux, if:
- multiple Chrome/Chromedriver processes try to use the same directory, or
- a previous run crashed and left a lock file
then Chrome refuses to start. Selenium tries to launch Chrome, Chrome says “nope,” and you get the error.
Two key points I learned:
--user-data-dir
is a Chrome flag, not a Selenium server flag. If you pass it to Selenium’s server options, nothing happens.- Python Selenium often creates a temporary profile automatically. R’s
selenider
wasn’t doing that here it was trying to reuse the default Chrome profile.
A Unique Chrome Profile
The trick is simple create a new temporary profile directory every time the script runs, and tell Chrome to use it. While we’re at it, I added some extras so it runs nicely on a headless Linux server or in Docker:
--headless=new
for modern headless mode--no-sandbox
and--disable-dev-shm-usage
for low-memory or container setups- explicit waits so JavaScript-rendered elements load
- cleanup of temporary profile directories
The Improved, Linux Safe Script
library(selenider)
library(rvest)
library(glue)
library(withr)
# Create a unique Chrome profile directory
chrome_profile <- tempfile("chrome-profile-")
dir.create(chrome_profile, recursive = TRUE, showWarnings = FALSE)
# Chrome arguments (passed to Chrome, not Selenium server)
chrome_args <- c(
glue("--user-data-dir={chrome_profile}"),
"--headless=new",
"--no-sandbox",
"--disable-dev-shm-usage",
"--disable-gpu",
"--window-size=1920,1080"
)
# Selenium options for Chrome
server_options <- selenium_options(
browser = "chrome",
browser_args = chrome_args
)
# Clean up on exit
on.exit({
try(close_session(), silent = TRUE)
try(unlink(chrome_profile, recursive = TRUE, force = TRUE), silent = TRUE)
}, add = TRUE)
# Start session
session <- selenider_session(
backend = "selenium",
browser = "chrome",
options = server_options
)
# Go to the page
open_url("https://egamersworld.com/callofduty/matches")
# Wait for dynamic elements
wait_until(
condition = {
src <- get_page_source()
length(html_elements(src, ".item_teams__cKXQT")) > 0
},
timeout = 10
)
# Parse the page
src <- get_page_source()
elements <- html_elements(src, ".item_teams__cKXQT")
res <- data.frame(
home_team_name = elements |>
html_elements(".item_team__evhUQ:nth-child(1) .item_teamName__NSnfH") |>
html_text(trim = TRUE),
home_team_odds = elements |>
html_elements(".item_team__evhUQ:nth-child(1) .item_odd__Lm2Wl") |>
html_text(trim = TRUE),
away_team_name = elements |>
html_elements(".item_team__evhUQ:nth-child(3) .item_teamName__NSnfH") |>
html_text(trim = TRUE),
away_team_odds = elements |>
html_elements(".item_team__evhUQ:nth-child(3) .item_odd__Lm2Wl") |>
html_text(trim = TRUE),
match_date = elements |>
html_elements(".item_scores__Vi7YX .item_date__g4cq_") |>
html_text(trim = TRUE),
match_time = elements |>
html_elements(".item_scores__Vi7YX .item_time__xBia_") |>
html_text(trim = TRUE),
match_type = elements |>
html_elements(".item_scores__Vi7YX .item_bo__u2C9Q") |>
html_text(trim = TRUE),
stringsAsFactors = FALSE
)
print(glue("Scraped {nrow(res)} rows"))
# Save results
outfile <- sprintf("egamers_callofduty_matches_%s.csv", format(Sys.time(), "%Y%m%d_%H%M%S"))
write.csv(res, outfile, row.names = FALSE)
message(glue("Saved to {outfile}"))
What Better Now
- No more “user data dir already in use”
Every run gets a fresh profile. - Truly headless
Works without a display server, perfect for cron jobs. - Safe in Docker
Flags like--no-sandbox
prevent Chrome crashes in containers. - Dynamic content handled
Explicit waits make sure JavaScript has done its thing before scraping. - Clean exit
Temp profiles are deleted automatically.
Bonus no Selenium at all
If you don’t need Selenium, selenider
also supports Chromote, which talks directly to Chrome’s DevTools protocol—no driver, no --user-data-dir
headaches:
library(selenider)
library(rvest)
session <- selenider_session("chromote")
open_url("https://egamersworld.com/callofduty/matches")
wait_until({
src <- get_page_source()
length(html_elements(src, ".item_teams__cKXQT")) > 0
}, timeout = 10)
src <- get_page_source()
elements <- html_elements(src, ".item_teams__cKXQT")
This often works out of the box on Linux.
Final thought
Moving scraping jobs to Linux often reveals weird edge cases you never hit on Windows this --user-data-dir
issue was one of them. The fix turned out to be straightforward once I understood that Chrome itself, not Selenium, needed the argument.
Now my script runs quietly in the background on a schedule, reliably pulling match data without choking on profile locks. If you’re deploying your own selenider
scrapers to Linux, start with a unique profile per run you’ll save yourself a lot of head-scratching later.