Skip to content

OpenBLAS bottlenecks multithreading benefits at 8 working threads despite setting BLAS threads to 1 #5589

@BioTurboNick

Description

@BioTurboNick

I have code that heavily makes use of the LAPACK syevr functions via Julia. They're relatively small matrices (at most 15x15). I'm processing chunks of video frames across multiple threads, and each thread will perform millions of these operations. I've set BLAS threads to 1, which I understand to mean that OpenBLAS just uses the parent thread calling it. (Setting it to anything more than 1 tanks performance generally.)

However, what I've found is that no matter what size computer I run on, performance gains stop once I reach 8 working threads; even worsening with many more. Somehow it seems that OpenBLAS, without itself doing multithreaded computation, is interfering with higher-level multithreading?

If I switch to MKL with 1 thread, I see continued performance improvements through 48 CPUs.

I'm willing to poke around at this as much as I can myself, I'm just not sure where to begin. Where might the bottleneck be?

Image

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions