This will generally turn off more FPU codegen, even if its using
software fallback unless the project/target/platform selects a cpu that
has FPU support. This also turns off a few blocks of test code and the
upcoming floating point printf if it's not present on the arch.
This may break projects that were compiling for say cortex-m0 but
expected FPU code to be present. If so it should be pretty easy to
override it, but not going to add that yet unless it's necessary.
The test inside riscv/rules.mk was assuming gcc and that the CC variable
isn't passed in from the user. This is not a very clean solution and
acts like a bandaid over the problem. Added some todos for a potential
solution.
No real functional change except how the smaller ARCH_DEFAULT_PAGE_SIZE
is now computed and set in defines.h instead of rules.mk for arch/arm to
be consistent with the other arch that has a large/small build (riscv).
This triggered a out of range count overflow warning when
building for RV32 (calling the macro with `ticks >> 32`)
since the truncation happened before the shift.
I hadn't noticed this before, but you can directly reference a global
variable in a load/store in assembly, which combines a lla + ld/sd into
a 2 instruction pair instead of 3 due to the 12 bit offset provided in
the load/store.
This fixes a problem if the text segment gets larger than ~1MB where the
raw jal instruction cannot reach. Using 'call' or 'tail' allows the
assembler to emit a 2 instruction sequence that the linker later
relaxes if it can.
Though using the named nmemonics is a generally better idea it has the
unforunate property of not working on older compilers. In this case,
these new registers are for the Sstic extension, which is new enough
that even reasonably recent compilers as GCC 12.1 doesn't understand it.
Fixes issue #410
To work properly with some hypervisors on various architectures (ARM,
ARM64, x86), add global routines to allow access to MMIO registers via
architecturally defined accessors.
Add accessors for ARM, ARM64, and x86-32/64. Have the other arches
default to just using whatever the compiler emits.
Will need to generally move things off the legacy REG*() accessors
since they're really not safe going forward with what compilers emit.
A pretty simple mechanism, a list of extensions added to
RISCV_EXTENSION_LIST make variable is expanded to an underscore
delimited string appended to the end of -march=
Pretty simple but it should work for now.
Instead of setting a counter of the number of secondaries to start, have
platform or target code pass in a list of harts to start instead. This
allows for there to be discontinuties in the layout of the cpu harts, or
in the case of some sifive based hardware, hart 0 is otherwise offline.
-Hard set the spinlock type to uint32_t to be clearer
-Switch the free/held value to a simple 0 or 1
Previously, was writing the current cpu number into the spinlock, which
was only useful for debugging purposes. However, since the atomic
operation was an amoswap instead of a proper CAS, it would end up
overwriting the old cpu number with the new cpu number when it tried to
take it. It would still work as a spinlock, since the value was !0, but
it was falsely tracking which cpu actually held it.
To keep it simple, just switch to 0/1 and stick with the amoswap
instruction, which is much more efficient than a LR/SC pair to implement
CAS on riscv.
Internally still use an unsigned long for the old value, since the
amoswap instruction overwrites the entire Rd register, and thus keeps
the codegen efficient since it wont try to sign extend it for any
comparisons afterwards.
Thanks @loqoman (darwin.s.clark@gmail.com) for catching this one.
Clang's assembler rejects expressions containing e.g. (1u << N) in the
assembler. Instead using numeric expressions for per-privilege level
CSRs, we can prepend `m` or `s`. This also lets the compiler assign the
CSR encoding instead of having to hardcode it in the source code.
Add a few more #defines for the status register
Tweak the FPU context switch to not rewrite the entire status register,
which may not be a generally safe thing to do.
Updated comment regarding clobbers on the SBI call.
Tweak a boot time print of SBI version.
Add a commented out line to the do-qemuriscv to assist in dumping future
device trees.
Trim the arch mmu unit tests accordingly.
Should probably switch this to a #define, but it's possible some of
these queries could be dynamically detected (XN for example). May
revisit at some point.
Allow asking the arch layer if it supports NX pages or NS pages.
Have the arch mmu test code test accordingly.
Also tweak the tests to pass on arm32 mmu, which does not precisely
match the return semantics of the rest of the mmu routines on map/unmap.
Based on building with --warn-undefined-variables, find a few places in
the build system where undefined variables were used incorrectly, or
never set due to unused code.
When building fpu variant, use -march=rv64imafdc instead of rc64gc
since some older compilers and/or mainline do not understand the
equivalence when selecting libgcc.
This is only an issue on rv64 due to the need to pick out the medany
variant of libgcc.
Currently only implemented for double precision floating point.
Caveat: currently unable to only compile some code with or without
float. The linker is extremely picky about mixing float and no-float
objects, so stick with all on or off for now.
It's not as much of a problem currently because the toolchain is not
using any riscv vector instructions to assist normal code, so it's
generally only emitting fpu instructions for floating point code.
Have the arch define additional compiler flags to explicit support or
not support a floating point unit.
Add ability for modules to per file or for the whole module mark code
as needing floating point support.
Add default flags for arm64, riscv, and x86 toolchains.
Needed because gcc 12 is getting much more aggressive about using vector
instructions for non float code, so getting away with avoiding it was
no longer working.
Still not perfect: printf code is being compiled with float, so it's
possible to use floating point instructions inside core kernel or
interrupt handling code if a printf is used.
Possibly will have problems on architectures where mixing float and non
float code at the linker generates issues, but so far seems to be okay.
This works around an issue with newer compilers that default to an
earlier ISA spec that doesn't by default include the zicsr extension by
default.
If the compiler doesn't support the switch, then it's assumed that it
has the extension by default, as older gcc compilers did.
Previously was hard coding the instructions to work around a limitation
of the assembler that did not allow using fpu instructions when the code
was being compiled without support. Move the zeroing routine into a
separate assembly file and override the architure at the top.
Add build system support for at least being aware of the FPU on
the architecture, not building code to use it.
At the moment, only sets up the FPU into Initial state prior to
entering user space and then ignores it.
For user space support, the sscratch register cannot hold the pointer to
the current cpu, as much as it is convenient.
Change the logic to use tp register (x4) to point to percpu, and
dereference the local thread from it directly.
Kind of wasteful, but much simpler than having to manually sync every time something changes
in the kernel aspace. I think riscv machines with mmu can waste 1MB of page tables up front.
Can revisit later if needed.
Up until now the bottom part of ram has been identity mapped, left over
from initial bootstrapping. Set up two top level page tables: one with the
the identity map and one without. Once the kernel starts switch to the second
but keep the former around for bootstrapping secondary cpus.
Start adding support for user address spaces, currently mostly untested.
Still have to solve the problem of keeping the kernel parts of the page tables
in sync. Will probably preallocate all of the ones needed.
Move out of inline routines since the body is relatively
large and to keep the disassembly clean. Have spinlocks
store the holder cpu + 1 instead of just 1. Add an appropriate
barrier to the release.