How I Solve a Parallel Computing Issue in R with doMC and foreach on Linux

Running parallel code in Linux using R can be much more efficient than in Windows, largely thanks to Linux’s native support for the mclapply() function. After successfully getting parallel processing to work on Windows using doSNOW, I wanted to move my code to a Linux environment and make use of doMC.

It should’ve been, but I ran into a frustrating and cryptic error when I tried to combine foreach, doMC, and parallel processing with mclapply. This blog post documents how I debugged and fixed the issue—and how you can avoid it, too.

The Initial Code

Here’s the first version of the code I tried to run on Linux:

foreachFunc <- function(Data) {
RowFunction <- function(d) {
chisq.test(d)$p.value
}
P <- as.matrix(apply(Data, 1, RowFunction))
return(P)
}

library(doMC)
library(foreach)

number_of_cpus <- 4
cl <- makeCluster(number_of_cpus)
registerDoMC(cl)

Chunks <- c(1:NROW(Data_new)) %% 4

P <- foreach(i = 0:3, .combine = rbind, mc.cores = 4) %dopar% {
foreachFunc(Data_new[Chunks == i, ])
}

stopCluster(cl)

The Error I Got

Error in mclapply(argsList, FUN, mc.preschedule = preschedule, mc.set.seed = set.seed,  : 
(list) object cannot be coerced to type 'integer'

What Went Wrong

The error message doesn’t make it obvious, but here’s what I discovered:

  • I was mixing doMC and makeCluster(), which is a big no-no.
    • makeCluster() and stopCluster() are meant for doParallel or doSNOW.
    • doMC is meant to be simpler and works only on Unix-like systems. It internally uses mclapply()no cluster setup required.
  • I also mistakenly added mc.cores=4 inside the foreach() loop, which is not valid when using %dopar% with doMC. That’s only used with mclapply() directly.

The Fix and Improved Code

Here’s the corrected and working version of the code:

library(doMC)
library(foreach)

# Register parallel backend
number_of_cpus <- 4
registerDoMC(cores = number_of_cpus)

# Simulate sample data
set.seed(123)
Data_new <- matrix(sample(1:10, 1000, replace = TRUE), nrow = 100)

# Function to apply chisq.test row-wise
foreachFunc <- function(Data) {
RowFunction <- function(d) {
chisq.test(d)$p.value
}
P <- apply(Data, 1, RowFunction)
return(P)
}

# Split data into chunks for parallelism
Chunks <- c(1:NROW(Data_new)) %% number_of_cpus

# Run parallel computation
P <- foreach(i = 0:(number_of_cpus - 1), .combine = c) %dopar% {
foreachFunc(Data_new[Chunks == i, ])
}

And just like that no more errors, and the computation ran beautifully in parallel.

Extra Feature I Added for Practice

Once the core code was working, I added a few useful enhancements to explore the power of parallelization further:

Add a Progress Bar (With pbapply)

If you want to visualize the progress of your loop, pbapply is a super helpful library:

library(pbapply)
p_values <- pbapply(Data_new, 1, function(d) chisq.test(d)$p.value)

Benchmark Performance (With microbenchmark)

Let’s see how much faster the parallel code really is:

library(microbenchmark)

microbenchmark(
serial = apply(Data_new, 1, function(d) chisq.test(d)$p.value),
parallel = foreach(i = 0:(number_of_cpus - 1), .combine = c) %dopar% {
foreachFunc(Data_new[Chunks == i, ])
},
times = 5
)

Handle Errors Gracefully (With tryCatch)

chisq.test() can fail if the input isn’t suitable. I wrapped it in tryCatch() to avoid crashing mid-loop:

RowFunction <- function(d) {
tryCatch({
chisq.test(d)$p.value
}, error = function(e) {
NA
})
}

Key Takeaway

  • Don’t use makeCluster() with doMC — it’s unnecessary and will cause issues.
  • Avoid mc.cores in foreach() — that’s only for mclapply().
  • Use registerDoMC(cores = X) as the only setup needed for doMC.
  • Wrap your test functions in tryCatch() to make your pipeline resilient.
  • For cross-platform compatibility, you’re better off using doParallel.

Cross Platform Safe Version

If your code needs to run on both Windows and Linux, here’s a safer and more portable version using doParallel:

library(doParallel)
library(foreach)

number_of_cpus <- parallel::detectCores()
cl <- makeCluster(number_of_cpus)
registerDoParallel(cl)

Chunks <- c(1:NROW(Data_new)) %% number_of_cpus

P <- foreach(i = 0:(number_of_cpus - 1), .combine = c) %dopar% {
foreachFunc(Data_new[Chunks == i, ])
}

stopCluster(cl)

Final Thought

Getting parallel computing to work across platforms can be tricky, especially when packages like doMC and doParallel behave so differently. I learned the hard way that Linux’s mclapply() is powerful but easy to misuse when you assume all parallel packages work the same.

Related blog posts