From 5be56fe2eb1c9dedb262aaaff63fd308d33fa034 Mon Sep 17 00:00:00 2001 From: Lephenixnoir Date: Tue, 13 Oct 2020 19:30:45 +0200 Subject: [PATCH] let the user allocate contexts locally This incompatible change modifies the allocation method for contexts. Instead of managing a set of IDs in an malloc()-backed array, contexts can now be allocated on the stack or static RAM. Contexts are values of type [prof_t] and can be initialized to the value provided by prof_make(). lvalues to contexts replace context IDs. Finally, prof_exec() no longer requires a context to be specified, it creates one on the fly. --- README.md | 191 +++++++++++++++++++++++------------------------------- libprof.c | 48 +++----------- libprof.h | 70 ++++++++++++-------- 3 files changed, 133 insertions(+), 176 deletions(-) diff --git a/README.md b/README.md index 3e05fae..711e496 100644 --- a/README.md +++ b/README.md @@ -1,8 +1,8 @@ # libprof: A performance profiling library for gint -libprof is a small gint library that can be used to time and profile the -execution of an add-in. Using it, one can record the time spent in one or -several functions to identify performance bottlenecks in an application. +libprof is a small gint library that times and profiles the execution of an +add-in. It is useful to record the time spent in one or several functions and +to identify performance bottlenecks in an application. libprof's measurements are accurate down to the microsecond-level thanks to precise hardware timers, so it can also be used to time even small portions of @@ -11,7 +11,7 @@ code. ## Building libprof is built and installed only once for both fx-9860G and fx-CG 50. The -The dependencies are: +dependencies are: * A GCC cross-compiler for a SuperH architecture * The [gint kernel](/Lephenixnoir/gint) @@ -37,153 +37,122 @@ will need to make and install twice. ## Basic use -To access the library, include the `` header file. +To access the library, include the `` header file and call +`prof_init()` somewhere so that libprof has access to a precise timer. If no +such timer is available, `prof_init()` returns non-zero (but normally either 2 +or 3 of the TMU are available when an add-in starts). ```c #include +prof_init(); ``` -For each function you want to time, libprof will create a counter. At the start -of the program, you need to specify how many functions (libprof calls them -*contexts*) you will be timing, so that libprof can allocate enough memory. - -libprof also needs one of gint's timer to actually measure time; it selects one -of TMU0, TMU1 and TMU2 which are the only ones suited for this precise job. The -available timer with the smallest interrupt priority is selected. - -This initialization happens in the `prof_init()` function. +To measure execution time, create a profiling context with `prof_make()`, then +call `prof_enter()` at the beginning of the code to time and `prof_leave()` at +the end. ```c -/* Initialize libprof for 13 contexts (automatically selects a timer) */ -prof_init(13); -``` - -You can then measure the execution time of a function by calling `prof_enter()` -at the beginning and `prof_end()` at the end. You just need to "name" the -function by giving its context ID, which is any number between 0 and the number -of contexts passed to `prof_init()` (here 0 to 12). - -```c -void function5(void) +void function_to_time(void) { - prof_enter(5); + prof_t prof = prof_make(); + prof_enter(prof); + /* Do stuff... */ - prof_leave(5); + + prof_leave(prof); } ``` -This will add `function5()`'s execution time to the 5th counter, so if the -function is called several times the total execution time will be recorded. -This way, at the end of the program, you can look at the counters to see where -most of the time has been spent. - -To retrieve the total execution time of a function, use `prof_time()` : +The context records the time spent between calls to `prof_enter()` and +`prof_leave()`. It can be entered multiple times and will simply accumulate the +time spent in its counter. When the counter is not running, you can query +recorded time with `prof_time()`. ```c -uint32_t total_function5_us = prof_time(5); +uint32_t total_function_us = prof_time(prof); ``` -This time is measured in microseconds, even though the timers are actually more +This time is returned in microseconds, even though the timers are slightly more precise than this. Note that the overhead of `prof_enter()` and `prof_leave()` -is usually less than 1 microsecond, so the time is very close to the actual -time spent in the function even if the context is frequently entered and left. +is usually less than 1 microsecond, so the measure is very close to the +wall-clock time spent in the function even if the context is frequently used. -At the end of the program, free the resources of the library by calling -`prof_quit()`. +At the end of the program or whenever you need the timer that libprof is +holding, call `prof_quit()` to free the resources. This will make the timer +available to `timer_setup()` again. ```c prof_quit(); ``` -## Managing context numbers +## Recursive functions -The number of contexts must be set for all execution and all context IDs must -be between 0 and this number (excluded). Managing the numbers by hand is error- -prone and can lead to memory errors. - -A simple way of managing context numbers without risking an error is to use an -enumeration. +The profiler context keeps track of recursive calls so that functions that +enter and leave recursively can be timed as well. The only difference with the +basic example is that we need to make sure a single context is used (instead +of creating a new one in each stack frame). Making it static is enough. ```c -enum { - /* Whatever function you need */ - PROFCTX_FUNCTION1 = 0, - PROFCTX_FUNCTION2, - PROFCTX_FUNCTION3, - - PROFCTX_COUNT, -}; -``` - -Enumerations will assign a value to all the provided names, and increment by -one each time. So for example here `PROFCTX_FUNCTION2` is equal to `1` and -`PROFCTX_COUNT` is equal to `3`. As you can see this is conveniently equal to -the number of contexts, which makes it simple to initialize the library: - -```c -prof_init(PROFCTX_COUNT); -``` - -Then you can use context names instead of numbers: - -```c -prof_enter(PROFCTX_FUNCTION1); -/* Do stuff... */ -prof_leave(PROFCTX_FUNCTION1); -``` - -If you want to use a new context, you just need to add a name in the -enumeration (anywhere but after `PROFCTX_COUNT`) and all IDs plus the -initialization call will be updated automatically. - -## Timing a single execution - -`prof_enter()` and `prof_leave()` will add the measured execution time to the -context counter. Sometimes you want to make individual measurements instead of -adding all calls together. To achieve this effect, clear the counter before -the measure using `prof_clear()`. - -Here is an example of a function `exec_time_us()` that times the execution of a -function `f` passed as parameter. - -```c -uint32_t exec_time_us(void (*f)(void)) +void recursive_function_to_time(void) { - int ctx = PROFCTX_EXEC_TIME_US; - prof_clear(ctx); - prof_enter(ctx); + static prof_t prof = prof_make(); + prof_enter(prof); - f(); + /* Stuff... */ + recursive_function_to_time(); + /* Stuff... */ - prof_leave(ctx); - return prof_time(ctx); + prof_leave(prof); } ``` -## Exploiting the measure's precision +However it makes it somewhat difficult to retrieve the elapsed time because it +is hidden withing the function's name scope. Making it global can help. -The overhead of `prof_enter()` and `prof_leave()` is usually less than a -microsecond, but the starting time of your benchmark might count (loading data -from memory to initialize arrays, performing function calls...). In this case, -the best you can do is measure the time difference between two similar calls. +## Timing a single execution -If you need something even more precise then you can access libprof's counter -array directly to get the timer-tick value itself: +In many cases applications just need to measure a single piece of code once and +get the resulting time. `prof_exec()` is a shortcut macro that does exactly +that, creating a temporary context and returning elapsed time. -```c -uint32_t elapsed_timer_tick = prof_elapsed[ctx]; +``` +uint32_t elapsed_us = prof_exec({ + scene_graph_render(); +}); ``` -The frequency of this tick is PΦ/4, where the value of PΦ can be obtained by -querying gint's clock module: +The macro expands to a short code block that wraps the argument and returns +the `prof_time()` of the temporary context. + +## Using the timers's full precision + +Hardware timers run at 7-15 MHz depending on the calculator model, so the time +measure it slightly more precise than what `prof_time()` shows you. You can +access that value through the `elapsed` field of a context object. + +The value is a count of ticks that occur at a frequency of PΦ/4, where the +value of PΦ can be obtained by querying gint's CPG driver: ```c #include uint32_t tick_freq = clock_freq()->Pphi_f / 4; ``` -One noteworthy phenomenon is the startup cost. The first few measurements are -always less precise, probably due to cache effects. I frequently have a first -measurement with an additional 100 us of execution time and 3 us of overhead, -which subsequent tests remove. So it is expected for the first few points of -data to lie outside the range of the next. +Note that the overhead of `prof_enter()` and `prof_leave()` is usually less +than a microsecond, but more than a few timer ticks. + +Due in part to caching effects, the first measurement for a given code sequence +is likely to be larger than the other ones. I have seen effects such as 3 µs +for a no-op (cache misses in libprof's code) and 100 µs for real cases (cache +misses in the code itself). Make sure to make several measurements and use +serious statistical methods! + +## Overclock interference + +Contexts store timer tick counts, which are converted to microsecond delays +only when `prof_time()` is called. Do not mix measurements performed at +different overclock settings as the results will be erroneous. + +What you can do is call `prof_time()` and reset the context (by assigning it +`prof_make()`) before switching clock settings, then add up the microsecond +delays when the execution is over. diff --git a/libprof.c b/libprof.c index ab82c0c..017a9ac 100644 --- a/libprof.c +++ b/libprof.c @@ -1,51 +1,34 @@ #include #include #include -#include #include #include -/* Recursion depth of each function currently being executed */ -uint8_t *prof_rec = NULL; -/* Time that has elapsed within each function; the value for a given function - is only relevant when it is not executing, due to optimizations */ -uint32_t *prof_elapsed = NULL; /* Timer counter */ uint32_t volatile *prof_tcnt = NULL; /* Timer ID */ static int prof_timer = -1; -/* prof_init(): Initialize the profiler's data and timer */ -int prof_init(int context_count) +/* prof_init(): Initialize the profiler's timer */ +int prof_init(void) { - prof_rec = malloc(context_count * sizeof *prof_rec); - prof_elapsed = malloc(context_count * sizeof *prof_elapsed); - /* Get a TMU with the exact constant 0xffffffff */ int timer = -1; for(int t = 2; t >= 0 && timer == -1; t--) { timer = timer_setup(t, 0xffffffff, NULL); } - - if(!prof_rec || !prof_elapsed || timer == -1) + if(timer == -1) { prof_quit(); return 1; } - /* Fix the configuration done by gint by disabling the interrupt */ - if(isSH3()) - { - SH7705_TMU.TMU[timer].TCR.UNIE = 0; - prof_tcnt = &SH7705_TMU.TMU[timer].TCNT; - } - else - { - SH7305_TMU.TMU[timer].TCR.UNIE = 0; - prof_tcnt = &SH7305_TMU.TMU[timer].TCNT; - } + /* Keep the address of the TCNT register */ + prof_tcnt = isSH3() + ? &SH7705_TMU.TMU[timer].TCNT + : &SH7305_TMU.TMU[timer].TCNT; timer_start(timer); prof_timer = timer; @@ -53,27 +36,16 @@ int prof_init(int context_count) return 0; } -/* prof_quit(): Free the profiler's data and timer */ +/* prof_quit(): Free the profiler's timer */ void prof_quit(void) { if(prof_timer >= 0) timer_stop(prof_timer); - if(prof_rec) free(prof_rec); - if(prof_elapsed) free(prof_elapsed); - prof_timer = -1; - prof_rec = NULL; - prof_elapsed = NULL; } -//--- -// Post-measurement analysis -//--- - /* prof_time(): Time spent in a given context, in microseconds */ -uint32_t prof_time(int ctx) +uint32_t prof_time(prof_t prof) { int Pphi = clock_freq()->Pphi_f; - uint64_t time = prof_elapsed[ctx]; - - return (time * 4 * 1000000) / Pphi; + return ((uint64_t)prof.elapsed * 4 * 1000000) / Pphi; } diff --git a/libprof.h b/libprof.h index ec1a1e8..2dd0db4 100644 --- a/libprof.h +++ b/libprof.h @@ -6,37 +6,43 @@ #define LIBPROF_LIBPROF #include +#include //--- // Initialization //--- -/* prof_init(): Initialize the profiler's data and timer +/* prof_init(): Initialize the profiler's timer - Initializes [prof_rec] and [prof_time] (see below) with enough elements to - hold all the context IDs. Context IDs should be numbered from 0 to [n-1]; - due to speed requirements array bounds are not checked so be careful. + Starts a timer to count time. libprof automatically selects a TMU and tries + to use TMU2 before TMU1 before TMU0 so that high-priority interrupts remain + available, and sets an accurate clock configuration. - Also starts a timer to count time. libprof automatically selects a TMU and - tries to use TMU2 before TMU1 before TMU0 so that high-priority interrupts - remain available, and sets an accurate clock configuration. + Returns non-zero if no timer is available. */ +int prof_init(void); - @context_count Number of different contexts that will be measured - Returns non-zero if a setup error occurs or no timer is available. */ -int prof_init(int context_count); - -/* prof_quit(): Free the profiler's data and timer */ +/* prof_quit(): Free the profiler's timer */ void prof_quit(void); //--- // Runtime time measurement //--- -/* Recursion depth of each function currently being executed */ -extern uint8_t *prof_rec; -/* Time that has elapsed within each function; the value for a given function - is only relevant when it is not executing, due to optimizations */ -extern uint32_t *prof_elapsed; +/* Context object, has an elasped delay and a recursion level. This object can + be created on the stack of a function that measures its execution time, + except if the function is recursive, in which case it should be either + static or global. */ +typedef struct prof_t +{ + uint32_t rec; + uint32_t elapsed; + +} GPACKED(4) prof_t; + +/* prof_make(): Create a new context object + A context can be cleared by assigning it prof_make() again. */ +#define prof_make() ((prof_t){ 0, 0 }) + /* Timer counter */ extern uint32_t volatile *prof_tcnt; @@ -44,8 +50,8 @@ extern uint32_t volatile *prof_tcnt; This macro should be called at the start of the context scope. If the function was already executing then the deepest instance in the stack is used instead of creating a new counter. */ -#define prof_enter(ctx) { \ - if(!prof_rec[ctx]++) prof_elapsed[ctx] += *prof_tcnt; \ +#define prof_enter(prof) { \ + if(!prof.rec++) prof.elapsed += *prof_tcnt; \ } /* prof_leave(): Stop counting time for a function @@ -53,15 +59,25 @@ extern uint32_t volatile *prof_tcnt; stops if there is no deeper instance of the context in the stack. If there are not as exactly as many prof_leave()'s as prof_enter()'s then the resulting time measure will not be relevant at all. */ -#define prof_leave(ctx) { \ - if(!--prof_rec[ctx]) prof_elapsed[ctx] -= *prof_tcnt; \ +#define prof_leave(prof) { \ + if(!--prof.rec) prof.elapsed -= *prof_tcnt; \ } -/* prof_clear(): Clear a context's counter - This operation is defined only if the context is not being profiled. */ -#define prof_clear(ctx) { \ - prof_elapsed[ctx] = 0; \ -} +/* prof_exec(): Measure a single block of code + This operation can be used when profiling is not required, and instead + libprof is used to measure the performance of a single bit of code. Use it + like this: + + uint32_t elasped_us = prof_exec({ + exec_code(); + }); */ +#define prof_exec(code) ({ \ + prof_t prof = prof_make(); \ + prof_enter(prof); \ + code; \ + prof_leave(prof); \ + prof_time(prof); \ +}) //--- // Post-measurement analysis @@ -69,6 +85,6 @@ extern uint32_t volatile *prof_tcnt; /* prof_time(): Time spent in a given context, in microseconds Should only be called when the context is not currently executing. */ -uint32_t prof_time(int ctx); +uint32_t prof_time(prof_t prof); #endif /* LIBPROF_LIBPROF */