AnsweredAssumed Answered

Intel solver resource balloon!  Anyone else get intel solver always out of core?

Question asked by Steven Soeder on Feb 8, 2016
Latest reply on Feb 10, 2016 by Steven Soeder

I'm not sure if this is worth reporting to my VAR yet (as in a bug) or not, so I wanted to see if anyone else is seeing the same behavior.


I've reported this to my VAR.  I just wanted to know if anyone else has the same trouble with the Intel direct solver.


Every time I use the Intel solver it uses ALL of my memory (seemingly regardless of how much memory the machine has) and goes out of core (to the disk) and then it fails, telling me there is insufficient memory for the out of core solution.


I've tried running this study on two different machines, and I get the same resource balloon behavior on both machines.

I watched the resource monitor during solution and the Intel solver eats up ALL of the available memory, goes out of core, and then fails.

This happens on both machines, EVERY time, on every problem I've tried (so far this is one assembly model, trying 5-6 different studies, including static non-linear and static linear).


TLDR; I listed some KPIs at the bottom of this post comparing both machines using both Direct solvers vs the Intel.




Case study:

This is a linear static study on a 5 component assembly.

I've cut the assembly in half and I'm using planar symmetry.

I've simplified the components removing unnecessary features.

There are some non-penetration and some bonded contacts.

The whole thing is properly constrained and there is no movement or major non-linearities.

The problem does solve using direct & large problem direct sparse solvers, so I don't think there is an inherent issue with the setup or anything else in problem definition.

This particular study has around 2.6 million DOF, and some of the other versions where I'm attempting to achieve convergence with mesh controls and refinements, get upwards of 6.5 million DOF.  This is where the Intel direct solver is supposed to shine.

*Note, on the problems that get into 6-7million DOF the direct sparse solver does switch to large problem direct sparse.


The Intel solver definitely uses a lot more of the available resources, and it seems to (initially) solve large problems MUCH faster than Direct or LP Direct Sparse solvers, however it's totally useless as it bombs itself out every time.


Now it appears to me, from watching the KPIs available in resource monitor and the windows 10 task manager, that the Intel solver is unable to free the memory it's using quick enough to keep up with itself.  Here's what I mean:


Pre-balloon memory composition:

Lots of available free memory, a little in standby, very little "modified" memory. 


Early balloon memory composition:

Lots of memory being used, very little free. 

Seems to be ramping up and taking good advantage of the available memory.


Mid balloon memory composition:

No free or standby memory; "modified" memory is starting to overtake the memory in use.  MidBalloon.png

Late balloon memory composition:

Almost everything is "modified" memory; it seems to me like the solver is unable to release this memory and it's growth just spirals out of control.


Post balloon (balloon burst) memory composition:

The solver appears to have quit, and the memory has been freed back up again. 



Screenshots are from a video I recorded of the whole thing.

Solidworks Sim Pro Intel Solver Resource Balloon - YouTube

The ballooning starts around 21:33.


Forgive the iphone footage, I tried three different screen recorders and they all failed to record the video and save it in a playable format.

My guess is because of the massive resource issues.


And here are the mesh details from that study:




Lastly, here are my notes on general comparisons of resource allocation (taking advantage of resources) between the Intel Direct and LP Direct Sparse solvers.


Quantitative data:

Machine 1:  Lenovo W530 mobile workstation; Intel I7-3920XM 4-CORE CPU; 16GB DDR3 PC16000 (2000MHZ) RAM; 120GB SSD with 30GB free.

    • Typical CPU usage:  ≈20%
    • MAX CPU usage:  ≈75%
    • Typical Memory usage: 6-8GB, fairly stable for entire solution
    • MAX Memory usage: ≈8GB
    • Typical CPU usage: ≈20-30%
    • MAX CPU usage: ≈100%
    • Typical Memory usage:  6-8GB pre-balloon; balloons to 100% (16GB) and goes out of core.


Machine 2:  DELL Precision 7810 workstation; Intel Xeon E5-2650 10-CORE CPU; 32GB DDR4 PC4-1700 (2133MHZ) RAM; 240GB SSD with 80GB free.

    • Typical CPU usage:  ≈15-20%
    • MAX CPU usage:  ≈70%
    • Typical Memory usage: 6-8GB, fairly stable for entire solution (doesn't seem to take advantage of the extra 100% memory on this machine)
    • Total time to complete first step and reach 100% of the second step "establishing the stiffness matrix" ≈2.5 hours total solve time.
    • Time observed in this state (100% of "calculating the stiffness matrix") (≈27% total solution progress) >4 hours (>6.5 hours solve total)
      • Note: time observed, because I left for the day at this point.  I don't know exactly how long it stayed in this state.
    • Typical CPU usage: 30%
    • MAX CPU usage: ≈70%
      Typical Memory usage:  6-8GB pre-balloon; balloons to 100% (32GB) and goes out of core.
    • Time to complete first step "establishing element connectivity" ≈17 minutes
    • Time to reach 100% of the second step "establishing stiffness matrix" ≈15 seconds (approx 17.25 minutes total solve time)
    • Time spent in this state (100% of "calculating the stiffness matrix") (≈27% total solution progress) up until error message and solver quit ≈10 minutes (27 minutes total solve time)


Qualitative data:

  • Large Problem Direct Sparse
    • Relatively high valley to peak ratio in CPU and Memory usage
    • Lots of apparent "down time" where nothing seems to be happening.
      • My guess is solver is going through sequential problems here, where it can only do one problem at a time.
    • Never seems to take advantage of the CPU/Memory available - it seems like it would be just as fast on a machine half as powerful.  In fact, it pretty much is - as my desktop workstation (machine 2) has more than double the processor cores and double the RAM. 
  • Intel Direct:
    • Relatively short valley:peak ratio
    • Seems to solve MUCH faster; e.g. appears to finish equivalent tasks nearly 9X faster than the large problem direct solver. (17.25 minutes vs 2.5 hours)