-Move the local apic driver to arch/x86
-Add routines to send IPIs between cpus
Something is unstable at the moment and the system crashes after a while
with random corruptions when using SMP.
Rearrange some of the cpu initialization code to be runnable on each cpu
as they come up. Complete the 64bit bootstrap mechanism and call into C
code.
Makes it as far as trying to reschedule via an IPI. Need to implement
local apic based IPI mechanism.
It's getting too hard to maintain a single layout that works with both,
so go ahead and split it. Also redo the layout so it should be usable
with user space and syscall and sysenter instructions from either mode.
I hadn't noticed this before, but you can directly reference a global
variable in a load/store in assembly, which combines a lla + ld/sd into
a 2 instruction pair instead of 3 due to the 12 bit offset provided in
the load/store.
This fixes a problem if the text segment gets larger than ~1MB where the
raw jal instruction cannot reach. Using 'call' or 'tail' allows the
assembler to emit a 2 instruction sequence that the linker later
relaxes if it can.
When dropping from EL2 (or EL3), load vmpidr_el2 and vpidr_el2 with the
correct values to make sure EL1 sees the 'real' mpidr_el1 and midr_el1.
Though in most cases they're already configured by whatever firmware ran
before, there's no actual guarantee that it is, and it may be full of
random garbage.
Though using the named nmemonics is a generally better idea it has the
unforunate property of not working on older compilers. In this case,
these new registers are for the Sstic extension, which is new enough
that even reasonably recent compilers as GCC 12.1 doesn't understand it.
Fixes issue #410
Sadly this doesn't really work in all situations and only happens to
work with gcc + binutils for 32bit accesses, presumably because gnu as
replaces a literal 0 with wzr.
Clang doesn't understand it at all.
This reverts commit 6c14941dec.
When dumping_mode_regs() on a fault, avoid printing the stack beyond the
current page. This prevents exceeding the stack base and hitting a
guard page in the case the stack use is < 128 bytes.
Bug: 336957655
Test: crash test, observe double fault fixed
Change-Id: If49b5fe5e1651557d19bf18c4026224cfb038101
For older compilers (gcc 7.5.0 in particular) avoid using
-mgeneral-regs-only to override the floating point switches, since it
doesn't seem to understand that switch.
Instead more properly add the floating point switches for a module or
source file compiled with float. More compatible with all compilers.
Was already added to arm64, but arch/arm hadn't picked up this feature
yet. Uncovered a few places here or there that wasn't marking code as
float/no-float, but this fixes a problem where newer compilers are
starting to sneak in vector code because they can.
Issue #406
To work properly with some hypervisors on various architectures (ARM,
ARM64, x86), add global routines to allow access to MMIO registers via
architecturally defined accessors.
Add accessors for ARM, ARM64, and x86-32/64. Have the other arches
default to just using whatever the compiler emits.
Will need to generally move things off the legacy REG*() accessors
since they're really not safe going forward with what compilers emit.
A pretty simple mechanism, a list of extensions added to
RISCV_EXTENSION_LIST make variable is expanded to an underscore
delimited string appended to the end of -march=
Pretty simple but it should work for now.
Instead of setting a counter of the number of secondaries to start, have
platform or target code pass in a list of harts to start instead. This
allows for there to be discontinuties in the layout of the cpu harts, or
in the case of some sifive based hardware, hart 0 is otherwise offline.
For both 32 and 64bit x86, have each of the exception stubs which push a
few words and branch to the common isr routine be simply 16 byte aligned
to make it easy to calculate the offset from the main isr table. This
cleans up some complexity that was actually broken for interrupts >= 0x80.
Also:
-Switch alignment directives to .balign
-Expand the x86-32 exception table to a full 256
-Remove an extraneous define
-Make sure the IDT is 8 or 16 byte aligned
-Use END_DATA and END_FUNCTION in the exception and gdt asm files
The context switch is now always performed inside the PendSV handler,
which greatly simplifies the code by reducing all switches to a single
path. This should also eliminate any race conditions during the switch.
Because we always enter PendSV for a switch, there is a slight
performance penalty in the case of switching from a non-preempted thread
to another non-preempted thread (~40 cycles longer on an M4, compared to
the previous implementation)
It will be removed in a upcoming CL, so remove it now so the future CL
cleanly applies.
Bump the major number of the structure in case there's a tool somewhere
that uses it.
-Hard set the spinlock type to uint32_t to be clearer
-Switch the free/held value to a simple 0 or 1
Previously, was writing the current cpu number into the spinlock, which
was only useful for debugging purposes. However, since the atomic
operation was an amoswap instead of a proper CAS, it would end up
overwriting the old cpu number with the new cpu number when it tried to
take it. It would still work as a spinlock, since the value was !0, but
it was falsely tracking which cpu actually held it.
To keep it simple, just switch to 0/1 and stick with the amoswap
instruction, which is much more efficient than a LR/SC pair to implement
CAS on riscv.
Internally still use an unsigned long for the old value, since the
amoswap instruction overwrites the entire Rd register, and thus keeps
the codegen efficient since it wont try to sign extend it for any
comparisons afterwards.
Thanks @loqoman (darwin.s.clark@gmail.com) for catching this one.
Precisely set bits [55:22] of the vaddress in bits [43:0] for the vae1is
and vaee1is TLBI commands.
On platforms where FEAT_TLL is implemented, bits [47:44] of the command
accept a TTL parameter which can optionally be set to hint the
translation table level containing the address being invalidated.
Implementations aren't architecturally required to perform the
invalidation if the hint is incorrect however. Invalidations may
therefore fail with the current implementation if the vaddress has bits
set in [58:55].
This is notably an issue on ARM fastmodels which doesn't perform the
invalidation when the TTL parameter is incorrect.
Clang's assembler rejects expressions containing e.g. (1u << N) in the
assembler. Instead using numeric expressions for per-privilege level
CSRs, we can prepend `m` or `s`. This also lets the compiler assign the
CSR encoding instead of having to hardcode it in the source code.
The current code results in
`error: invalid reassignment of non-absolute variable 'isr_stub_start'`.
Use a numbered label instead (as that can be reassigned) and reference
the last occurrence using the b suffix.