The post SMP algorithm wasn't taking into account time on any given core
if the core was currently idle at the time the load interrupt fired.
Also track rescheduling ipis and condense the output slightly.
-Each cpu's idle thread now doesn't sit in the run queue and is only
selected when no other threads are ready to run. This means there is
now an implicit affinity for the idle threads for each cpu, and reduces
the amount of idle thread thrashing without real affinity (or per cpu
run queues).
-Fix some bitmap logic in the mp_reschedule and mp_mbx_reschedule_irq
code path that probably resulted in not enough reschedules.
-Change a few spots for mp reschedule ipis in general thread path. Add
ipi to thread_resume, remove one from thread_create_etc.