Though using the named nmemonics is a generally better idea it has the
unforunate property of not working on older compilers. In this case,
these new registers are for the Sstic extension, which is new enough
that even reasonably recent compilers as GCC 12.1 doesn't understand it.
Fixes issue #410
Sadly this doesn't really work in all situations and only happens to
work with gcc + binutils for 32bit accesses, presumably because gnu as
replaces a literal 0 with wzr.
Clang doesn't understand it at all.
This reverts commit 6c14941dec.
When dumping_mode_regs() on a fault, avoid printing the stack beyond the
current page. This prevents exceeding the stack base and hitting a
guard page in the case the stack use is < 128 bytes.
Bug: 336957655
Test: crash test, observe double fault fixed
Change-Id: If49b5fe5e1651557d19bf18c4026224cfb038101
For older compilers (gcc 7.5.0 in particular) avoid using
-mgeneral-regs-only to override the floating point switches, since it
doesn't seem to understand that switch.
Instead more properly add the floating point switches for a module or
source file compiled with float. More compatible with all compilers.
Was already added to arm64, but arch/arm hadn't picked up this feature
yet. Uncovered a few places here or there that wasn't marking code as
float/no-float, but this fixes a problem where newer compilers are
starting to sneak in vector code because they can.
Issue #406
To work properly with some hypervisors on various architectures (ARM,
ARM64, x86), add global routines to allow access to MMIO registers via
architecturally defined accessors.
Add accessors for ARM, ARM64, and x86-32/64. Have the other arches
default to just using whatever the compiler emits.
Will need to generally move things off the legacy REG*() accessors
since they're really not safe going forward with what compilers emit.
A pretty simple mechanism, a list of extensions added to
RISCV_EXTENSION_LIST make variable is expanded to an underscore
delimited string appended to the end of -march=
Pretty simple but it should work for now.
Instead of setting a counter of the number of secondaries to start, have
platform or target code pass in a list of harts to start instead. This
allows for there to be discontinuties in the layout of the cpu harts, or
in the case of some sifive based hardware, hart 0 is otherwise offline.
For both 32 and 64bit x86, have each of the exception stubs which push a
few words and branch to the common isr routine be simply 16 byte aligned
to make it easy to calculate the offset from the main isr table. This
cleans up some complexity that was actually broken for interrupts >= 0x80.
Also:
-Switch alignment directives to .balign
-Expand the x86-32 exception table to a full 256
-Remove an extraneous define
-Make sure the IDT is 8 or 16 byte aligned
-Use END_DATA and END_FUNCTION in the exception and gdt asm files
The context switch is now always performed inside the PendSV handler,
which greatly simplifies the code by reducing all switches to a single
path. This should also eliminate any race conditions during the switch.
Because we always enter PendSV for a switch, there is a slight
performance penalty in the case of switching from a non-preempted thread
to another non-preempted thread (~40 cycles longer on an M4, compared to
the previous implementation)
It will be removed in a upcoming CL, so remove it now so the future CL
cleanly applies.
Bump the major number of the structure in case there's a tool somewhere
that uses it.
-Hard set the spinlock type to uint32_t to be clearer
-Switch the free/held value to a simple 0 or 1
Previously, was writing the current cpu number into the spinlock, which
was only useful for debugging purposes. However, since the atomic
operation was an amoswap instead of a proper CAS, it would end up
overwriting the old cpu number with the new cpu number when it tried to
take it. It would still work as a spinlock, since the value was !0, but
it was falsely tracking which cpu actually held it.
To keep it simple, just switch to 0/1 and stick with the amoswap
instruction, which is much more efficient than a LR/SC pair to implement
CAS on riscv.
Internally still use an unsigned long for the old value, since the
amoswap instruction overwrites the entire Rd register, and thus keeps
the codegen efficient since it wont try to sign extend it for any
comparisons afterwards.
Thanks @loqoman (darwin.s.clark@gmail.com) for catching this one.
Precisely set bits [55:22] of the vaddress in bits [43:0] for the vae1is
and vaee1is TLBI commands.
On platforms where FEAT_TLL is implemented, bits [47:44] of the command
accept a TTL parameter which can optionally be set to hint the
translation table level containing the address being invalidated.
Implementations aren't architecturally required to perform the
invalidation if the hint is incorrect however. Invalidations may
therefore fail with the current implementation if the vaddress has bits
set in [58:55].
This is notably an issue on ARM fastmodels which doesn't perform the
invalidation when the TTL parameter is incorrect.
Clang's assembler rejects expressions containing e.g. (1u << N) in the
assembler. Instead using numeric expressions for per-privilege level
CSRs, we can prepend `m` or `s`. This also lets the compiler assign the
CSR encoding instead of having to hardcode it in the source code.
The current code results in
`error: invalid reassignment of non-absolute variable 'isr_stub_start'`.
Use a numbered label instead (as that can be reassigned) and reference
the last occurrence using the b suffix.
Clang does not accept this .if condition since phys_offset is a register
alias and not an absolute expression. We can keep these two instructions
here if the argument is zero since the result will be the same.
Additionally, this macro is only called once and always passes a non-zero
argument. If more calls are added in the future and avoiding these two
instructions just before a loop is really important, we could use
`.ifnc \phys_offset,0` instead, but that looks rather obscure to me.
Update arm-m systick current_time() and current_time_hires() to advance
monotonically even when interrupts are disabled.
Previous implementation relied on the systick exception triggering
immediately when the counter wrapped and incrementing the current_ticks
count. But if called when interrutps are disabled, then the systick has
not had a chance to trigger and increment the count, but the counter has
already wrapped around. This would result in the current_time() value
moving backwards.
The implementation in this commit is as follows:
- Access to the systick val csr is always done in conjunction with a
check of the COUNTFLAG bit, which indicates counter reaching zero.
- The global current_ticks count is incremented immediately when
wrap around is detected via COUNTFLAG, rather than waiting for the
systick exception to run.
- The check of the counter value, COUNTFLAG bit, and current_ticks
global count are always done in a critical section to avoid race
conditions between current_time and the systick handler.
- The critical section and workarounds are consolidated into a helper
function shared by current_time() and current_time_hires() to
atomically get the tick count and the cycles since the last tick.
The effect should be that current_time() always returns an accurate
monotonically increasing value, regardless of interrupt enablement
or not.
Even though we only need 32 bits here, clang warns that we should be
using a "w" register in the inline assembly (which is not legal with
mrs/msr). Silence the warning by declaring the value as unsigned long.
(cherry picked from commit a0de5d88dfc67b3ba34c0455b1619e12e6cfccae)
Add a few more #defines for the status register
Tweak the FPU context switch to not rewrite the entire status register,
which may not be a generally safe thing to do.
Updated comment regarding clobbers on the SBI call.
Tweak a boot time print of SBI version.
Add a commented out line to the do-qemuriscv to assist in dumping future
device trees.
Move the mmu_initial_mapping from platform into arch/x86. For this
architecture the default mappings are basically hard coded in arch/x86
anyway, so move ownership of this data there, closer to where it's
actually initialized.
Update the 32 and 64bit paging code to properly use the paddr_to_kvaddr
and vice versa routines.
Update to allocate page tables directly from the pmm instead of the heap
(on 32bit code).
Clean up the x86.h file a bit with how constants are defined
Switch rdtsc to builtin
Added all of the known bits for the main CR registers.
Move the invlpg macro over to the common header.
Update comments in start.S
Trim the arch mmu unit tests accordingly.
Should probably switch this to a #define, but it's possible some of
these queries could be dynamically detected (XN for example). May
revisit at some point.
-Large cleanup of both the 32 and 64bit mmu code. Still the same general
flow but tighten up usage of mixing physical addresses with virtual
addresses.
-Remove the PAE code path from 32bit mmu, which was unused.