Commit Graph

94 Commits

Author SHA1 Message Date
Jim Mussared bbd8760bd9 all: Update Python formatting to ruff-format.
This updates a small number of files that change with ruff-format's (vs
black's) rules.

This work was funded through GitHub Sponsors.

Signed-off-by: Jim Mussared <jim.mussared@gmail.com>
2023-11-03 13:30:41 +11:00
Jim Mussared 64c79a5423 py/qstr: Add support for sorted qstr pools.
This provides a significant performance boost for qstr_find_strn, which is
called a lot during parsing and loading of .mpy files, as well as interning
of string objects (which happens in most string methods that return new
strings).

Also adds comments to explain the "static" qstrs.  These are part of the
.mpy ABI and avoid needing to duplicate string data for QSTRs known to
already be in the firmware.  The static pool isn't currently sorted, but in
the future we could either split the static pool into the sorted regions,
or in the next .mpy version just sort them.

Based on initial work done by @amirgon in #6896.

This work was funded through GitHub Sponsors.

Signed-off-by: Jim Mussared <jim.mussared@gmail.com>
2023-10-30 11:10:02 +11:00
Damien George 6967ff3c58 py/persistentcode: Bump .mpy sub-version.
This is required because the previous commit changed the .mpy native ABI.

Signed-off-by: Damien George <damien@micropython.org>
2023-10-16 11:25:31 +11:00
Angus Gratton 1a5c9b9da4 tools/mpy-tool.py: Ignore linter failure in Python 2 compatibility code.
Found by Ruff checking F821.

Signed-off-by: Angus Gratton <angus@redyak.com.au>
2023-08-16 16:16:44 +10:00
Angus Gratton 597fcb4751 tools/mpy-tool.py: Use isinstance() for type checking.
Ruff version 283 expanded E721 to fail when making direct comparison
against a built-in type.  Change the code to use isinstance() as
suggested, these usages appear to have equivalent functionality.

Signed-off-by: Angus Gratton <angus@redyak.com.au>
2023-08-09 17:41:54 +10:00
Christian Clauss 2a1db770ce all: Fix cases of Python variable assigned but never used.
This fixes ruff rule F841.
2023-05-02 16:36:05 +10:00
Martin Milata 850f09b109 tools/mpy-tool.py: Initialize line_info_top.
Without it the line number mapping doesn't work.

Signed-off-by: Martin Milata <martin@martinmilata.cz>
2023-02-01 13:17:22 +11:00
Jim Mussared 1ba0e8ff96 py/persistentcode: Only emit sub-version if generated code has native.
In order for v1.19.1 to load a .mpy, the formerly-feature-flags which are
now used for the sub-version must be zero.

The sub-version is only used to indicate a native version change, so it
should be zero when emitting bytecode-only .mpy files.

This work was funded through GitHub Sponsors.

Signed-off-by: Jim Mussared <jim.mussared@gmail.com>
2022-10-25 14:57:04 +11:00
Jim Mussared d94141e147 py/persistentcode: Introduce .mpy sub-version.
The intent is to allow us to make breaking changes to the native ABI (e.g.
changes to dynruntime.h) without needing the bytecode version to increment.

With this commit the two bits previously used for the feature flags (but
now unused as of .mpy version 6) encode a sub-version.  A bytecode-only
.mpy file can be loaded as long as MPY_VERSION matches, but a native .mpy
(i.e. one with an arch set) must also match MPY_SUB_VERSION.  This allows 3
additional updates to the native ABI per bytecode revision.

The sub-version is set to 1 because the previous commits that changed the
layout of mp_obj_type_t have changed the native ABI.

Signed-off-by: Jim Mussared <jim.mussared@gmail.com>
Signed-off-by: Damien George <damien@micropython.org>
2022-09-19 23:19:55 +10:00
Damien George 2fb413b265 tools/mpy-tool.py: Improve generated frozen identifiers.
Frozen identifiers now include their full name hierarchy, eg their class
name.  This makes it easier to understand the generated code.

Signed-off-by: Damien George <damien@micropython.org>
2022-06-07 16:55:18 +10:00
Damien George 599a22e569 tools/mpy-tool.py: Rework .mpy merging feature.
Now that the native qstr link table is gone, merging a native .mpy file
with a bytecode .mpy file is not as simple as concatenating the .mpy data.
The qstr_table and obj_table tables from all merged .mpy files must now be
joined together, because they are global to the .mpy file (and hence global
to the merged .mpy file).  This means the bytecode needs to be be decoded,
qstr_table and obj_table indices updated to point to the correct entries in
the new tables, and then the bytecode re-encoded.

This commit makes this change to the merging feature in mpy-tool.py.  This
can now merge an arbitrary number of bytecode .mpy files, and up to one
native .mpy file.

Signed-off-by: Damien George <damien@micropython.org>
2022-06-07 13:51:45 +10:00
Damien George f506bf342a py/bc: Remove unused mp_opcode_format function.
This was made redundant by f2040bfc7e, which
also did not update this function for the change to qstr-opcode encoding,
so it does not work correctly anyway.

Signed-off-by: Damien George <damien@micropython.org>
2022-06-07 13:32:38 +10:00
Damien George b37b578214 py/persistentcode: Remove remaining native qstr linking support.
Support for architecture-specific qstr linking was removed in
d4d53e9e11, where native code was changed to
access qstr values via qstr_table.  The only remaining use for the special
qstr link table in persistentcode.c is to support native module written in
C, linked via mpy_ld.py.  But native modules can also use the standard
module-level qstr_table (and obj_table) which was introduced in the .mpy
file reworking in f2040bfc7e.

This commit removes the remaining native qstr liking support in
persistentcode.c's load_raw_code function, and adds two new relocation
options for constants.qstr_table and constants.obj_table.  mpy_ld.py is
updated to use these relocations options instead of the native qstr link
table.

Signed-off-by: Damien George <damien@micropython.org>
2022-06-07 13:19:55 +10:00
Damien George 1d047617bb tools/mpy-tool.py: Remove obsolete unicode flag in .mpy header.
This was removed in c49d5207e9

Signed-off-by: Damien George <damien@micropython.org>
2022-05-26 11:43:46 +10:00
Damien George d4d53e9e11 py/emitnative: Access qstr values using indirection table qstr_table.
This changes the native emitter to access qstr values using the qstr
indirection table qstr_table, but only when generating native code that
will be saved to a .mpy file.  This makes the resulting native code fully
static, ie it does not require any fix-ups or rewriting when it is
imported.

The performance of native code is more or less unchanged.  Benchmark
results on PYBv1.0 (using --via-mpy and --emit native) are:

N=100 M=100          baseline -> this-commit     diff      diff% (error%)
bm_chaos.py            407.16 ->     411.85 :   +4.69 =  +1.152% (+/-0.01%)
bm_fannkuch.py         100.89 ->     101.20 :   +0.31 =  +0.307% (+/-0.01%)
bm_fft.py             3521.17 ->    3441.72 :  -79.45 =  -2.256% (+/-0.00%)
bm_float.py           6707.29 ->    6644.83 :  -62.46 =  -0.931% (+/-0.00%)
bm_hexiom.py            55.91 ->      55.41 :   -0.50 =  -0.894% (+/-0.00%)
bm_nqueens.py         5343.54 ->    5326.17 :  -17.37 =  -0.325% (+/-0.00%)
bm_pidigits.py         603.89 ->     632.79 :  +28.90 =  +4.786% (+/-0.33%)
core_qstr.py            64.18 ->      64.09 :   -0.09 =  -0.140% (+/-0.01%)
core_yield_from.py     313.61 ->     311.11 :   -2.50 =  -0.797% (+/-0.03%)
misc_aes.py            654.29 ->     659.75 :   +5.46 =  +0.834% (+/-0.02%)
misc_mandel.py        4205.10 ->    4272.08 :  +66.98 =  +1.593% (+/-0.01%)
misc_pystone.py       3077.79 ->    3128.39 :  +50.60 =  +1.644% (+/-0.01%)
misc_raytrace.py       388.45 ->     393.71 :   +5.26 =  +1.354% (+/-0.01%)
viper_call0.py         576.83 ->     566.76 :  -10.07 =  -1.746% (+/-0.05%)
viper_call1a.py        550.39 ->     540.12 :  -10.27 =  -1.866% (+/-0.11%)
viper_call1b.py        438.32 ->     432.09 :   -6.23 =  -1.421% (+/-0.11%)
viper_call1c.py        442.96 ->     436.11 :   -6.85 =  -1.546% (+/-0.08%)
viper_call2a.py        536.31 ->     527.37 :   -8.94 =  -1.667% (+/-0.04%)
viper_call2b.py        378.99 ->     377.50 :   -1.49 =  -0.393% (+/-0.08%)

Signed-off-by: Damien George <damien@micropython.org>
2022-05-23 15:43:06 +10:00
Damien George 1fb01bd6c5 py/emitnative: Put a pointer to the native prelude in child_table array.
Some architectures (like esp32 xtensa) cannot read byte-wise from
executable memory.  This means the prelude for native functions -- which is
usually located after the machine code for the native function -- must be
placed in separate memory that can be read byte-wise.  Prior to this commit
this was achieved by enabling N_PRELUDE_AS_BYTES_OBJ for the emitter and
MICROPY_EMIT_NATIVE_PRELUDE_AS_BYTES_OBJ for the runtime.  The prelude was
then placed in a bytes object, pointed to by the module's constant table.

This behaviour is changed by this commit so that a pointer to the prelude
is stored either in mp_obj_fun_bc_t.child_table, or in
mp_obj_fun_bc_t.child_table[num_children] if num_children > 0.  The reasons
for doing this are:

1. It decouples the native emitter from runtime requirements, the emitted
   code no longer needs to know if the system it runs on can/can't read
   byte-wise from executable memory.

2. It makes all ports have the same emitter behaviour, there is no longer
   the N_PRELUDE_AS_BYTES_OBJ option.

3. The module's constant table is now used only for actual constants in the
   Python code.  This allows further optimisations to be done with the
   constants (eg constant deduplication).

Code size change for those ports that enable the native emitter:
   unix x64:   +80 +0.015%
      stm32:   +24 +0.004% PYBV10
    esp8266:   +88 +0.013% GENERIC
      esp32:   -20 -0.002% GENERIC[incl -112(data)]
        rp2:   +32 +0.005% PICO

Signed-off-by: Damien George <damien@micropython.org>
2022-05-17 16:44:49 +10:00
Damien George 07f526067e tools/mpy-tool.py: Intern more strings when freezing.
Signed-off-by: Damien George <damien@micropython.org>
2022-04-14 23:52:14 +10:00
Damien George 40d431d1bb tools/mpy-tool.py: Optimise freezing of str when str data is a qstr.
Signed-off-by: Damien George <damien@micropython.org>
2022-04-14 23:52:14 +10:00
Damien George e647966fc9 tools/mpy-tool.py: Make global qstr list a dedicated class.
Signed-off-by: Damien George <damien@micropython.org>
2022-04-14 23:52:14 +10:00
Damien George dfc6c6299c tools/mpy-tool.py: Optimise freezing of empty str and bytes objects.
Signed-off-by: Damien George <damien@micropython.org>
2022-04-14 23:52:14 +10:00
Damien George 9c8a56343f tools/mpy-tool.py: Optimise freezing of ints that can fit a small int.
Signed-off-by: Damien George <damien@micropython.org>
2022-04-14 23:52:14 +10:00
Damien George 68b3aeeb57 tools/mpy-tool.py: Support freezing tuples and other consts.
This also simplifies how constants are frozen.

Signed-off-by: Damien George <damien@micropython.org>
2022-04-14 23:52:14 +10:00
Damien George 2a075cc8a9 tools/mpy-tool.py: Support loading tuples from .mpy files.
Signed-off-by: Damien George <damien@micropython.org>
2022-04-14 23:52:14 +10:00
Damien George 42d0bd2c17 py/persistentcode: Define enum values for obj types instead of letters.
To keep the separate parts of the code that use these values in sync.  And
make it easier to add new object types.

Signed-off-by: Damien George <damien@micropython.org>
2022-04-14 22:44:04 +10:00
Damien George 6d11c69983 py: Change jump-if-x-or-pop opcodes to have unsigned offset argument.
These jumps are always forwards, and it's more efficient in the VM to
decode an unsigned argument.  These opcodes are already optimised versions
of the sequence "dup-top pop-jump-if-x pop" so it doesn't hurt generality
to optimise them further.

Signed-off-by: Damien George <damien@micropython.org>
2022-03-28 15:43:09 +11:00
Damien George 538c3c0a55 py: Change jump opcodes to emit 1-byte jump offset when possible.
This commit introduces changes:

- All jump opcodes are changed to have variable length arguments, of either
  1 or 2 bytes (previously they were fixed at 2 bytes).  In most cases only
  1 byte is needed to encode the short jump offset, saving bytecode size.

- The bytecode emitter now selects 1 byte jump arguments when the jump
  offset is guaranteed to fit in 1 byte.  This is achieved by checking if
  the code size changed during the last pass and, if it did (if it shrank),
  then requesting that the compiler make another pass to get the correct
  offsets of the now-smaller code.  This can continue multiple times until
  the code stabilises.  The code can only ever shrink so this iteration is
  guaranteed to complete.  In most cases no extra passes are needed, the
  original 4 passes are enough to get it right by the 4th pass (because the
  2nd pass computes roughly the correct labels and the 3rd pass computes
  the correct size for the jump argument).

This change to the jump opcode encoding reduces .mpy files and RAM usage
(when bytecode is in RAM) by about 2% on average.

The performance of the VM is not impacted, at least within measurment of
the performance benchmark suite.

Code size is reduced for builds that include a decent amount of frozen
bytecode.  ARM Cortex-M builds without any frozen code increase by about
350 bytes.

Signed-off-by: Damien George <damien@micropython.org>
2022-03-28 15:41:38 +11:00
robert-hh 5c46721a1c tools/mpy-tool.py: Fix frozen comment generation to escape chars.
That caused the compile of frozen_content.c to fail if characters like
backslash were in a short string.  Thanks to @hippy for identifying the
spot to change.
2022-02-28 18:47:24 +11:00
Damien George f2040bfc7e py: Rework bytecode and .mpy file format to be mostly static data.
Background: .mpy files are precompiled .py files, built using mpy-cross,
that contain compiled bytecode functions (and can also contain machine
code). The benefit of using an .mpy file over a .py file is that they are
faster to import and take less memory when importing.  They are also
smaller on disk.

But the real benefit of .mpy files comes when they are frozen into the
firmware.  This is done by loading the .mpy file during compilation of the
firmware and turning it into a set of big C data structures (the job of
mpy-tool.py), which are then compiled and downloaded into the ROM of a
device.  These C data structures can be executed in-place, ie directly from
ROM.  This makes importing even faster because there is very little to do,
and also means such frozen modules take up much less RAM (because their
bytecode stays in ROM).

The downside of frozen code is that it requires recompiling and reflashing
the entire firmware.  This can be a big barrier to entry, slows down
development time, and makes it harder to do OTA updates of frozen code
(because the whole firmware must be updated).

This commit attempts to solve this problem by providing a solution that
sits between loading .mpy files into RAM and freezing them into the
firmware.  The .mpy file format has been reworked so that it consists of
data and bytecode which is mostly static and ready to run in-place.  If
these new .mpy files are located in flash/ROM which is memory addressable,
the .mpy file can be executed (mostly) in-place.

With this approach there is still a small amount of unpacking and linking
of the .mpy file that needs to be done when it's imported, but it's still
much better than loading an .mpy from disk into RAM (although not as good
as freezing .mpy files into the firmware).

The main trick to make static .mpy files is to adjust the bytecode so any
qstrs that it references now go through a lookup table to convert from
local qstr number in the module to global qstr number in the firmware.
That means the bytecode does not need linking/rewriting of qstrs when it's
loaded.  Instead only a small qstr table needs to be built (and put in RAM)
at import time.  This means the bytecode itself is static/constant and can
be used directly if it's in addressable memory.  Also the qstr string data
in the .mpy file, and some constant object data, can be used directly.
Note that the qstr table is global to the module (ie not per function).

In more detail, in the VM what used to be (schematically):

    qst = DECODE_QSTR_VALUE;

is now (schematically):

    idx = DECODE_QSTR_INDEX;
    qst = qstr_table[idx];

That allows the bytecode to be fixed at compile time and not need
relinking/rewriting of the qstr values.  Only qstr_table needs to be linked
when the .mpy is loaded.

Incidentally, this helps to reduce the size of bytecode because what used
to be 2-byte qstr values in the bytecode are now (mostly) 1-byte indices.
If the module uses the same qstr more than two times then the bytecode is
smaller than before.

The following changes are measured for this commit compared to the
previous (the baseline):
- average 7%-9% reduction in size of .mpy files
- frozen code size is reduced by about 5%-7%
- importing .py files uses about 5% less RAM in total
- importing .mpy files uses about 4% less RAM in total
- importing .py and .mpy files takes about the same time as before

The qstr indirection in the bytecode has only a small impact on VM
performance.  For stm32 on PYBv1.0 the performance change of this commit
is:

diff of scores (higher is better)
N=100 M=100             baseline -> this-commit  diff      diff% (error%)
bm_chaos.py               371.07 ->  357.39 :  -13.68 =  -3.687% (+/-0.02%)
bm_fannkuch.py             78.72 ->   77.49 :   -1.23 =  -1.563% (+/-0.01%)
bm_fft.py                2591.73 -> 2539.28 :  -52.45 =  -2.024% (+/-0.00%)
bm_float.py              6034.93 -> 5908.30 : -126.63 =  -2.098% (+/-0.01%)
bm_hexiom.py               48.96 ->   47.93 :   -1.03 =  -2.104% (+/-0.00%)
bm_nqueens.py            4510.63 -> 4459.94 :  -50.69 =  -1.124% (+/-0.00%)
bm_pidigits.py            650.28 ->  644.96 :   -5.32 =  -0.818% (+/-0.23%)
core_import_mpy_multi.py  564.77 ->  581.49 :  +16.72 =  +2.960% (+/-0.01%)
core_import_mpy_single.py  68.67 ->   67.16 :   -1.51 =  -2.199% (+/-0.01%)
core_qstr.py               64.16 ->   64.12 :   -0.04 =  -0.062% (+/-0.00%)
core_yield_from.py        362.58 ->  354.50 :   -8.08 =  -2.228% (+/-0.00%)
misc_aes.py               429.69 ->  405.59 :  -24.10 =  -5.609% (+/-0.01%)
misc_mandel.py           3485.13 -> 3416.51 :  -68.62 =  -1.969% (+/-0.00%)
misc_pystone.py          2496.53 -> 2405.56 :  -90.97 =  -3.644% (+/-0.01%)
misc_raytrace.py          381.47 ->  374.01 :   -7.46 =  -1.956% (+/-0.01%)
viper_call0.py            576.73 ->  572.49 :   -4.24 =  -0.735% (+/-0.04%)
viper_call1a.py           550.37 ->  546.21 :   -4.16 =  -0.756% (+/-0.09%)
viper_call1b.py           438.23 ->  435.68 :   -2.55 =  -0.582% (+/-0.06%)
viper_call1c.py           442.84 ->  440.04 :   -2.80 =  -0.632% (+/-0.08%)
viper_call2a.py           536.31 ->  532.35 :   -3.96 =  -0.738% (+/-0.06%)
viper_call2b.py           382.34 ->  377.07 :   -5.27 =  -1.378% (+/-0.03%)

And for unix on x64:

diff of scores (higher is better)
N=2000 M=2000        baseline -> this-commit     diff      diff% (error%)
bm_chaos.py          13594.20 ->  13073.84 :  -520.36 =  -3.828% (+/-5.44%)
bm_fannkuch.py          60.63 ->     59.58 :    -1.05 =  -1.732% (+/-3.01%)
bm_fft.py           112009.15 -> 111603.32 :  -405.83 =  -0.362% (+/-4.03%)
bm_float.py         246202.55 -> 247923.81 : +1721.26 =  +0.699% (+/-2.79%)
bm_hexiom.py           615.65 ->    617.21 :    +1.56 =  +0.253% (+/-1.64%)
bm_nqueens.py       215807.95 -> 215600.96 :  -206.99 =  -0.096% (+/-3.52%)
bm_pidigits.py        8246.74 ->   8422.82 :  +176.08 =  +2.135% (+/-3.64%)
misc_aes.py          16133.00 ->  16452.74 :  +319.74 =  +1.982% (+/-1.50%)
misc_mandel.py      128146.69 -> 130796.43 : +2649.74 =  +2.068% (+/-3.18%)
misc_pystone.py      83811.49 ->  83124.85 :  -686.64 =  -0.819% (+/-1.03%)
misc_raytrace.py     21688.02 ->  21385.10 :  -302.92 =  -1.397% (+/-3.20%)

The code size change is (firmware with a lot of frozen code benefits the
most):

       bare-arm:  +396 +0.697%
    minimal x86: +1595 +0.979% [incl +32(data)]
       unix x64: +2408 +0.470% [incl +800(data)]
    unix nanbox: +1396 +0.309% [incl -96(data)]
          stm32: -1256 -0.318% PYBV10
         cc3200:  +288 +0.157%
        esp8266:  -260 -0.037% GENERIC
          esp32:  -216 -0.014% GENERIC[incl -1072(data)]
            nrf:  +116 +0.067% pca10040
            rp2:  -664 -0.135% PICO
           samd:  +844 +0.607% ADAFRUIT_ITSYBITSY_M4_EXPRESS

As part of this change the .mpy file format version is bumped to version 6.
And mpy-tool.py has been improved to provide a good visualisation of the
contents of .mpy files.

In summary: this commit changes the bytecode to use qstr indirection, and
reworks the .mpy file format to be simpler and allow .mpy files to be
executed in-place.  Performance is not impacted too much.  Eventually it
will be possible to store such .mpy files in a linear, read-only, memory-
mappable filesystem so they can be executed from flash/ROM.  This will
essentially be able to replace frozen code for most applications.

Signed-off-by: Damien George <damien@micropython.org>
2022-02-24 18:08:43 +11:00
Artyom Skrobov f46a7140f5 py/qstr: Use `const` consistently to avoid a cast.
Originally at adafruit#4707

Signed-off-by: Artyom Skrobov <tyomitch@gmail.com>
2022-02-11 22:55:02 +11:00
Artyom Skrobov 18b1ba086c py/qstr: Separate hash and len from string data.
This allows the compiler to merge strings: e.g. "update",
"difference_update" and "symmetric_difference_update" will all point to the
same memory.

No functional change.

The size reduction depends on the number of qstrs in the build.  The change
this commit brings is:

   bare-arm:    -4 -0.007%
minimal x86:  +150 +0.092% [incl +48(data)]
   unix x64:  -608 -0.118%
unix nanbox:  -572 -0.126% [incl +32(data)]
      stm32: -1392 -0.352% PYBV10
     cc3200:  -448 -0.244%
    esp8266: -1208 -0.173% GENERIC
      esp32: -1028 -0.068% GENERIC[incl -1020(data)]
        nrf:  -440 -0.252% pca10040
        rp2: -1072 -0.217% PICO
       samd:  -368 -0.264% ADAFRUIT_ITSYBITSY_M4_EXPRESS

Performance is also improved (on bare metal at least) for the
core_import_mpy_multi.py, core_import_mpy_single.py and core_qstr.py
performance benchmarks.

Originally at adafruit#4583

Signed-off-by: Artyom Skrobov <tyomitch@gmail.com>
2022-02-11 22:52:32 +11:00
Jim Mussared e0bf4611c3 py: Only search frozen modules when '.frozen' is found in sys.path.
This changes makemanifest.py & mpy-tool.py to merge string and mpy names
into the same list (now mp_frozen_names).

The various paths for loading a frozen module (mp_find_frozen_module) and
checking existence of a frozen module (mp_frozen_stat) use a common
function that searches this list.

In addition, the frozen lookup will now only take place if the path starts
with ".frozen", which needs to be added to sys.path.

This fixes issues #1804, #2322, #3509, #6419.

Signed-off-by: Jim Mussared <jim.mussared@gmail.com>
2021-12-18 00:01:59 +11:00
Jim Mussared b326edf68c all: Remove MICROPY_OPT_CACHE_MAP_LOOKUP_IN_BYTECODE.
This commit removes all parts of code associated with the existing
MICROPY_OPT_CACHE_MAP_LOOKUP_IN_BYTECODE optimisation option, including the
-mcache-lookup-bc option to mpy-cross.

This feature originally provided a significant performance boost for Unix,
but wasn't able to be enabled for MCU targets (due to frozen bytecode), and
added significant extra complexity to generating and distributing .mpy
files.

The equivalent performance gain is now provided by the combination of
MICROPY_OPT_LOAD_ATTR_FAST_PATH and MICROPY_OPT_MAP_LOOKUP_CACHE (which has
been enabled on the unix port in the previous commit).

It's hard to provide precise performance numbers, but tests have been run
on a wide variety of architectures (x86-64, ARM Cortex, Aarch64, RISC-V,
xtensa) and they all generally agree on the qualitative improvements seen
by the combination of MICROPY_OPT_LOAD_ATTR_FAST_PATH and
MICROPY_OPT_MAP_LOOKUP_CACHE.

For example, on a "quiet" Linux x64 environment (i3-5010U @ 2.10GHz) the
change from CACHE_MAP_LOOKUP_IN_BYTECODE, to LOAD_ATTR_FAST_PATH combined
with MAP_LOOKUP_CACHE is:

diff of scores (higher is better)
N=2000 M=2000       bccache -> attrmapcache      diff      diff% (error%)
bm_chaos.py        13742.56 ->   13905.67 :   +163.11 =  +1.187% (+/-3.75%)
bm_fannkuch.py        60.13 ->      61.34 :     +1.21 =  +2.012% (+/-2.11%)
bm_fft.py         113083.20 ->  114793.68 :  +1710.48 =  +1.513% (+/-1.57%)
bm_float.py       256552.80 ->  243908.29 : -12644.51 =  -4.929% (+/-1.90%)
bm_hexiom.py         521.93 ->     625.41 :   +103.48 = +19.826% (+/-0.40%)
bm_nqueens.py     197544.25 ->  217713.12 : +20168.87 = +10.210% (+/-3.01%)
bm_pidigits.py      8072.98 ->    8198.75 :   +125.77 =  +1.558% (+/-3.22%)
misc_aes.py        17283.45 ->   16480.52 :   -802.93 =  -4.646% (+/-0.82%)
misc_mandel.py     99083.99 ->  128939.84 : +29855.85 = +30.132% (+/-5.88%)
misc_pystone.py    83860.10 ->   82592.56 :  -1267.54 =  -1.511% (+/-2.27%)
misc_raytrace.py   21490.40 ->   22227.23 :   +736.83 =  +3.429% (+/-1.88%)

This shows that the new optimisations are at least as good as the existing
inline-bytecode-caching, and are sometimes much better (because the new
ones apply caching to a wider variety of map lookups).

The new optimisations can also benefit code generated by the native
emitter, because they apply to the runtime rather than the generated code.
The improvement for the native emitter when LOAD_ATTR_FAST_PATH and
MAP_LOOKUP_CACHE are enabled is (same Linux environment as above):

diff of scores (higher is better)
N=2000 M=2000        native -> nat-attrmapcache  diff      diff% (error%)
bm_chaos.py        14130.62 ->   15464.68 :  +1334.06 =  +9.441% (+/-7.11%)
bm_fannkuch.py        74.96 ->      76.16 :     +1.20 =  +1.601% (+/-1.80%)
bm_fft.py         166682.99 ->  168221.86 :  +1538.87 =  +0.923% (+/-4.20%)
bm_float.py       233415.23 ->  265524.90 : +32109.67 = +13.756% (+/-2.57%)
bm_hexiom.py         628.59 ->     734.17 :   +105.58 = +16.796% (+/-1.39%)
bm_nqueens.py     225418.44 ->  232926.45 :  +7508.01 =  +3.331% (+/-3.10%)
bm_pidigits.py      6322.00 ->    6379.52 :    +57.52 =  +0.910% (+/-5.62%)
misc_aes.py        20670.10 ->   27223.18 :  +6553.08 = +31.703% (+/-1.56%)
misc_mandel.py    138221.11 ->  152014.01 : +13792.90 =  +9.979% (+/-2.46%)
misc_pystone.py    85032.14 ->  105681.44 : +20649.30 = +24.284% (+/-2.25%)
misc_raytrace.py   19800.01 ->   23350.73 :  +3550.72 = +17.933% (+/-2.79%)

In summary, compared to MICROPY_OPT_CACHE_MAP_LOOKUP_IN_BYTECODE, the new
MICROPY_OPT_LOAD_ATTR_FAST_PATH and MICROPY_OPT_MAP_LOOKUP_CACHE options:
- are simpler;
- take less code size;
- are faster (generally);
- work with code generated by the native emitter;
- can be used on embedded targets with a small and constant RAM overhead;
- allow the same .mpy bytecode to run on all targets.

See #7680 for further discussion.  And see also #7653 for a discussion
about simplifying mpy-cross options.

Signed-off-by: Jim Mussared <jim.mussared@gmail.com>
2021-09-16 16:04:03 +10:00
Damien George 2c1a6a237d tools/mpy-tool.py: Support relocating ARMv6 arch.
Signed-off-by: Damien George <damien@micropython.org>
2021-05-26 16:24:00 +10:00
Damien George fe16e785fe tools/mpy-tool.py: List frozen modules in MICROPY_FROZEN_LIST_ITEM.
Signed-off-by: Damien George <damien@micropython.org>
2021-01-29 23:57:10 +11:00
Damien George 4f2fe34623 tools/mpy-tool.py: Fix merge of multiple mpy files to POP_TOP correctly.
MP_BC_CALL_FUNCTION will leave the result on the Python stack, so that
result must be discarded by MP_BC_POP_TOP.

Signed-off-by: Damien George <damien@micropython.org>
2020-09-09 00:11:51 +10:00
Martin Milata 492cf34fd8 tools/mpy-tool.py: Fix offset of line number info.
Signed-off-by: Martin Milata <martin@martinmilata.cz>
2020-08-21 16:17:07 +10:00
stijn bcf01d1686 all: Fix implicit conversion from double to float.
These are found when building with -Wfloat-conversion.
2020-04-18 22:42:24 +10:00
Damien George 69661f3343 all: Reformat C and Python source code with tools/codeformat.py.
This is run with uncrustify 0.70.1, and black 19.10b0.
2020-02-28 10:33:03 +11:00
Damien George fc97d6d1b5 tools/mpy-tool.py: Raise exception if trying to freeze relocatable mpy. 2019-12-12 20:15:28 +11:00
Damien George 27879844d2 tools/mpy-tool.py: Add ability to merge multiple .mpy files into one.
Usage:

    mpy-tool.py -o merged.mpy --merge mod1.mpy mod2.mpy

The constituent .mpy files are executed sequentially when the merged file
is imported, and they all use the same global namespace.
2019-12-12 20:15:28 +11:00
Damien George 360d972c16 py/nativeglue: Add new header file with native function table typedef. 2019-12-12 20:15:28 +11:00
Damien George 7f24c29778 tools/mpy-tool.py: Support qstr linking when freezing Xtensa native mpy. 2019-11-28 13:11:51 +11:00
Damien George 36c9be6f60 tools/mpy-tool.py: Use "@progbits #" attribute for native xtensa code. 2019-11-04 15:31:42 +11:00
Damien George 23f0691fdd py/persistentcode: Make .mpy more compact with qstr directly in prelude.
Instead of encoding 4 zero bytes as placeholders for the simple_name and
source_file qstrs, and storing the qstrs after the bytecode, store the
qstrs at the location of these 4 bytes.  This saves 4 bytes per bytecode
function stored in a .mpy file (for example lcd160cr.mpy drops by 232
bytes, 4x 58 functions).  And resulting code size is slightly reduced on
ports that use this feature.
2019-10-15 16:56:27 +11:00
Damien George 9adedce42e py: Add new Xtensa-Windowed arch for native emitter.
Enabled via the configuration MICROPY_EMIT_XTENSAWIN.
2019-10-05 13:44:53 +10:00
Damien George c8c0fd4ca3 py: Rework and compress second part of bytecode prelude.
This patch compresses the second part of the bytecode prelude which
contains the source file name, function name, source-line-number mapping
and cell closure information.  This part of the prelude now begins with a
single varible length unsigned integer which encodes 2 numbers, being the
byte-size of the following 2 sections in the header: the "source info
section" and the "closure section".  After decoding this variable unsigned
integer it's possible to skip over one or both of these sections very
easily.

This scheme saves about 2 bytes for most functions compared to the original
format: one in the case that there are no closure cells, and one because
padding was eliminated.
2019-10-01 12:26:22 +10:00
Damien George b5ebfadbd6 py: Compress first part of bytecode prelude.
The start of the bytecode prelude contains 6 numbers telling the amount of
stack needed for the Python values and exceptions, and the signature of the
function.  Prior to this patch these numbers were all encoded one after the
other (2x variable unsigned integers, then 4x bytes), but using so many
bytes is unnecessary.

An entropy analysis of around 150,000 bytecode functions from the CPython
standard library showed that the optimal Shannon coding would need about
7.1 bits on average to encode these 6 numbers, compared to the existing 48
bits.

This patch attempts to get close to this optimal value by packing the 6
numbers into a single, varible-length unsigned integer via bit-wise
interleaving.  The interleaving scheme is chosen to minimise the average
number of bytes needed, and at the same time keep the scheme simple enough
so it can be implemented without too much overhead in code size or speed.
The scheme requires about 10.5 bits on average to store the 6 numbers.

As a result most functions which originally took 6 bytes to encode these 6
numbers now need only 1 byte (in 80% of cases).
2019-10-01 12:26:22 +10:00
Damien George 5716c5cf65 py/persistentcode: Bump .mpy version to 5.
The bytecode opcodes have changed (there are more, and they have been
reordered).
2019-09-26 16:39:37 +10:00
Josh Lloyd 7d58a197cf py: Rename MP_QSTR_NULL to MP_QSTRnull to avoid intern collisions.
Fixes #5140.
2019-09-26 16:04:56 +10:00
Damien George 1f7202d122 py/bc: Replace big opcode format table with simple macro. 2019-09-26 15:27:11 +10:00