It is the small stuff that can problematic. I have a system with four AMD Opteron 6272s. This gives us a total of 64 cores. During testing using an OpenMP based program doing integer operations all 64 cores came up to speed.
However, during another program which required floating point calculation, things suddenly ground to a halt. A quick look at the diagram in the article below shows the issue.
http://www.anandtech.com/show/4955/the-bulldozer-review-amd-fx8150-tested/2
The Bulldozer architecture pairs two cores together. The issue is that those two cores share a floating point scheduler. So when doing floating point calculations, the two cores act as one. Therefore you end up with 32 Cores instead of 64.