Add ss commmand to search for a string #12

Open
Dr-Carlos wants to merge 4 commits from Dr-Carlos/fxos:find-string into master
Collaborator

Hello!

This PR implements the ss command, allowing the user to search disassembly for a regex pattern (or just string).

I think the best way of explaining it is to simply show an example output (without colour):

cg_3.60 @ 0x80000000> ss "0xfd8019c0"
<%0000 ClearHourGlass>
 8002c548:  d687   mov.l   0xfd8019c0 HourGlassBitmapNumber, r6

<%0001 HourGlass1>
 8002c550:  d585   mov.l   0xfd8019c0 HourGlassBitmapNumber, r5

<%0002 HourGlassTimer>
 8002c564:  d780   mov.l   0xfd8019c0 HourGlassBitmapNumber, r7
 8002c582:  d779   mov.l   0xfd8019c0 HourGlassBitmapNumber, r7


4 occurences found.

or

cg_3.60 @ 0x80000000> ss "PowerOff"
<%0005>
 8002cafe:  d97f   mov.l   %1839 PowerOff_OS, r9

<%0cba>
 801768b2:  d24a   mov.l   %1839 PowerOff_OS, r2

<%0dde>
 8018bbc6:  d27f   mov.l   %1839 PowerOff_OS, r2

<%0ddf>
 8018bf2e:  d24a   mov.l   %1839 PowerOff_OS, r2

<%11c1>
 801df260:  d253   mov.l   %1ea5 GetAutoPowerOffFlag, r2
 801df346:  d21a   mov.l   %1ea5 GetAutoPowerOffFlag, r2

<%127a>
 801e36fc:  d268   mov.l   %1839 PowerOff_OS, r2
 801e3740:  d75f   mov.l   %183a PowerOff, r7

<%12be>
 801e5f16:  d270   mov.l   %1e91 GetAutoPowerOffTime, r2
 801e5f2a:  d26d   mov.l   %1e90 SetAutoPowerOffTime, r2

<%12c0 GetKeyWait_OS>
 801e63de:  d254   mov.l   %1ea5 GetAutoPowerOffFlag, r2
 801e64e8:  d211   mov.l   %1ea5 GetAutoPowerOffFlag, r2
 801e6570:  d28a   mov.l   %1ea5 GetAutoPowerOffFlag, r2
 801e659c:  d27f   mov.l   %1ea5 GetAutoPowerOffFlag, r2
 801e65ae:  de80   mov.l   %1839 PowerOff_OS, r14

<%12c2>
 801e6632:  d25a   mov.l   %1ea5 GetAutoPowerOffFlag, r2

<%12d2>
 801e6b8a:  d23f   mov.l   %1ea5 GetAutoPowerOffFlag, r2
 801e6b9c:  d23b   mov.l   %1839 PowerOff_OS, r2

<%1834 RTC_StartHalfSecondPeriodicInterrupt>
 802ae7f0:  d24c   mov.l   %1ea5 GetAutoPowerOffFlag, r2

<%1839 PowerOff_OS>
 802aebf2:  d276   mov.l   %1ea5 GetAutoPowerOffFlag, r2

<%1e05 APP_SYSTEM_POWER>
 8035c892:  d255   mov.l   %1e90 SetAutoPowerOffTime, r2
 8035c8aa:  d24f   mov.l   %1e90 SetAutoPowerOffTime, r2

<%1e06>
 8035ca8e:  d650   mov.l   %1e91 GetAutoPowerOffTime, r6

<%1e60 SpecialMatrixcodeProcessing>
 80363ab4:  d564   mov.l   %1839 PowerOff_OS, r5
 80363b0c:  d352   mov.l   %1ea5 GetAutoPowerOffFlag, r3

Unknown:
 8011b00a:  d194   mov.l   %1e91 GetAutoPowerOffTime, r1
 8011b12c:  d24e   mov.l   %1839 PowerOff_OS, r2
 80150c7c:  d972   mov.l   %1ea4 SetAutoPowerOffFlag, r9
 80150f82:  d169   mov.l   %1ea4 SetAutoPowerOffFlag, r1
 801563e0:  d20d   mov.l   %1839 PowerOff_OS, r2
 802b0ce2:  de38   mov.l   %1e90 SetAutoPowerOffTime, r14
 8036532e:  d571   mov.l   %1839 PowerOff_OS, r5
 803656b2:  d36a   mov.l   %1ea4 SetAutoPowerOffFlag, r3

33 occurences found.

or this

cg_3.60 @ 0x80000000> ss "mov.l   .+PowerOff.+, r2"
<%0cba>
 801768b2:  d24a   mov.l   %1839 PowerOff_OS, r2

<%0dde>
 8018bbc6:  d27f   mov.l   %1839 PowerOff_OS, r2

<%0ddf>
 8018bf2e:  d24a   mov.l   %1839 PowerOff_OS, r2

<%11c1>
 801df260:  d253   mov.l   %1ea5 GetAutoPowerOffFlag, r2
 801df346:  d21a   mov.l   %1ea5 GetAutoPowerOffFlag, r2

<%127a>
 801e36fc:  d268   mov.l   %1839 PowerOff_OS, r2

<%12be>
 801e5f16:  d270   mov.l   %1e91 GetAutoPowerOffTime, r2
 801e5f2a:  d26d   mov.l   %1e90 SetAutoPowerOffTime, r2

<%12c0 GetKeyWait_OS>
 801e63de:  d254   mov.l   %1ea5 GetAutoPowerOffFlag, r2
 801e64e8:  d211   mov.l   %1ea5 GetAutoPowerOffFlag, r2
 801e6570:  d28a   mov.l   %1ea5 GetAutoPowerOffFlag, r2
 801e659c:  d27f   mov.l   %1ea5 GetAutoPowerOffFlag, r2

<%12c2>
 801e6632:  d25a   mov.l   %1ea5 GetAutoPowerOffFlag, r2

<%12d2>
 801e6b8a:  d23f   mov.l   %1ea5 GetAutoPowerOffFlag, r2
 801e6b9c:  d23b   mov.l   %1839 PowerOff_OS, r2

<%1834 RTC_StartHalfSecondPeriodicInterrupt>
 802ae7f0:  d24c   mov.l   %1ea5 GetAutoPowerOffFlag, r2

<%1839 PowerOff_OS>
 802aebf2:  d276   mov.l   %1ea5 GetAutoPowerOffFlag, r2

<%1e05 APP_SYSTEM_POWER>
 8035c892:  d255   mov.l   %1e90 SetAutoPowerOffTime, r2
 8035c8aa:  d24f   mov.l   %1e90 SetAutoPowerOffTime, r2

Unknown:
 8011b12c:  d24e   mov.l   %1839 PowerOff_OS, r2
 801563e0:  d20d   mov.l   %1839 PowerOff_OS, r2

21 occurences found.

or even this

cg_3.60 @ 0x80000000> ss claims=false "mov.l   .+PowerOff.+, r2"
 8011b12c:  d24e   mov.l   %1839 PowerOff_OS, r2
 801563e0:  d20d   mov.l   %1839 PowerOff_OS, r2
 801768b2:  d24a   mov.l   %1839 PowerOff_OS, r2
 8018bbc6:  d27f   mov.l   %1839 PowerOff_OS, r2
 8018bf2e:  d24a   mov.l   %1839 PowerOff_OS, r2
 801df260:  d253   mov.l   %1ea5 GetAutoPowerOffFlag, r2
 801df346:  d21a   mov.l   %1ea5 GetAutoPowerOffFlag, r2
 801e36fc:  d268   mov.l   %1839 PowerOff_OS, r2
 801e5f16:  d270   mov.l   %1e91 GetAutoPowerOffTime, r2
 801e5f2a:  d26d   mov.l   %1e90 SetAutoPowerOffTime, r2
 801e63de:  d254   mov.l   %1ea5 GetAutoPowerOffFlag, r2
 801e64e8:  d211   mov.l   %1ea5 GetAutoPowerOffFlag, r2
 801e6570:  d28a   mov.l   %1ea5 GetAutoPowerOffFlag, r2
 801e659c:  d27f   mov.l   %1ea5 GetAutoPowerOffFlag, r2
 801e6632:  d25a   mov.l   %1ea5 GetAutoPowerOffFlag, r2
 801e6b8a:  d23f   mov.l   %1ea5 GetAutoPowerOffFlag, r2
 801e6b9c:  d23b   mov.l   %1839 PowerOff_OS, r2
 802ae7f0:  d24c   mov.l   %1ea5 GetAutoPowerOffFlag, r2
 802aebf2:  d276   mov.l   %1ea5 GetAutoPowerOffFlag, r2
 8035c892:  d255   mov.l   %1e90 SetAutoPowerOffTime, r2
 8035c8aa:  d24f   mov.l   %1e90 SetAutoPowerOffTime, r2

21 occurences found.

When claims=true (default), all addresses which are not claimed are put into an 'Unknown' section, and all addresses which are claimed are listed under their claimed function. This supports claims by syscalls and other functions.

To implement this, I had to move the code from PrintPass' analyzeInstruction into analyzeInstructionFull, and add two options: (std::optional<std::string *> output, bool print_syscall_names).

If output is specified, instead of printing to stdout, the output from the print pass is put into that string. If print_syscall_names is false, the <%num name> won't be printed before syscalls.

PrintPass.analyzeInstruction calls analyzeInstructionFull with output as an empty optional and print_syscall_names as true.

The ss code is reasonably well commented, but if you have any questions, please ask.

I am also unsure if ss is the right name for this. It was mentioned in the name planning issue, so I used it, but you may have a different idea for what that command would do, so please suggest a different name for this one if appropriate.

Hello! This PR implements the `ss` command, allowing the user to search disassembly for a regex pattern (or just string). I think the best way of explaining it is to simply show an example output (without colour): ``` cg_3.60 @ 0x80000000> ss "0xfd8019c0" <%0000 ClearHourGlass> 8002c548: d687 mov.l 0xfd8019c0 HourGlassBitmapNumber, r6 <%0001 HourGlass1> 8002c550: d585 mov.l 0xfd8019c0 HourGlassBitmapNumber, r5 <%0002 HourGlassTimer> 8002c564: d780 mov.l 0xfd8019c0 HourGlassBitmapNumber, r7 8002c582: d779 mov.l 0xfd8019c0 HourGlassBitmapNumber, r7 4 occurences found. ``` or ``` cg_3.60 @ 0x80000000> ss "PowerOff" <%0005> 8002cafe: d97f mov.l %1839 PowerOff_OS, r9 <%0cba> 801768b2: d24a mov.l %1839 PowerOff_OS, r2 <%0dde> 8018bbc6: d27f mov.l %1839 PowerOff_OS, r2 <%0ddf> 8018bf2e: d24a mov.l %1839 PowerOff_OS, r2 <%11c1> 801df260: d253 mov.l %1ea5 GetAutoPowerOffFlag, r2 801df346: d21a mov.l %1ea5 GetAutoPowerOffFlag, r2 <%127a> 801e36fc: d268 mov.l %1839 PowerOff_OS, r2 801e3740: d75f mov.l %183a PowerOff, r7 <%12be> 801e5f16: d270 mov.l %1e91 GetAutoPowerOffTime, r2 801e5f2a: d26d mov.l %1e90 SetAutoPowerOffTime, r2 <%12c0 GetKeyWait_OS> 801e63de: d254 mov.l %1ea5 GetAutoPowerOffFlag, r2 801e64e8: d211 mov.l %1ea5 GetAutoPowerOffFlag, r2 801e6570: d28a mov.l %1ea5 GetAutoPowerOffFlag, r2 801e659c: d27f mov.l %1ea5 GetAutoPowerOffFlag, r2 801e65ae: de80 mov.l %1839 PowerOff_OS, r14 <%12c2> 801e6632: d25a mov.l %1ea5 GetAutoPowerOffFlag, r2 <%12d2> 801e6b8a: d23f mov.l %1ea5 GetAutoPowerOffFlag, r2 801e6b9c: d23b mov.l %1839 PowerOff_OS, r2 <%1834 RTC_StartHalfSecondPeriodicInterrupt> 802ae7f0: d24c mov.l %1ea5 GetAutoPowerOffFlag, r2 <%1839 PowerOff_OS> 802aebf2: d276 mov.l %1ea5 GetAutoPowerOffFlag, r2 <%1e05 APP_SYSTEM_POWER> 8035c892: d255 mov.l %1e90 SetAutoPowerOffTime, r2 8035c8aa: d24f mov.l %1e90 SetAutoPowerOffTime, r2 <%1e06> 8035ca8e: d650 mov.l %1e91 GetAutoPowerOffTime, r6 <%1e60 SpecialMatrixcodeProcessing> 80363ab4: d564 mov.l %1839 PowerOff_OS, r5 80363b0c: d352 mov.l %1ea5 GetAutoPowerOffFlag, r3 Unknown: 8011b00a: d194 mov.l %1e91 GetAutoPowerOffTime, r1 8011b12c: d24e mov.l %1839 PowerOff_OS, r2 80150c7c: d972 mov.l %1ea4 SetAutoPowerOffFlag, r9 80150f82: d169 mov.l %1ea4 SetAutoPowerOffFlag, r1 801563e0: d20d mov.l %1839 PowerOff_OS, r2 802b0ce2: de38 mov.l %1e90 SetAutoPowerOffTime, r14 8036532e: d571 mov.l %1839 PowerOff_OS, r5 803656b2: d36a mov.l %1ea4 SetAutoPowerOffFlag, r3 33 occurences found. ``` or this ``` cg_3.60 @ 0x80000000> ss "mov.l .+PowerOff.+, r2" <%0cba> 801768b2: d24a mov.l %1839 PowerOff_OS, r2 <%0dde> 8018bbc6: d27f mov.l %1839 PowerOff_OS, r2 <%0ddf> 8018bf2e: d24a mov.l %1839 PowerOff_OS, r2 <%11c1> 801df260: d253 mov.l %1ea5 GetAutoPowerOffFlag, r2 801df346: d21a mov.l %1ea5 GetAutoPowerOffFlag, r2 <%127a> 801e36fc: d268 mov.l %1839 PowerOff_OS, r2 <%12be> 801e5f16: d270 mov.l %1e91 GetAutoPowerOffTime, r2 801e5f2a: d26d mov.l %1e90 SetAutoPowerOffTime, r2 <%12c0 GetKeyWait_OS> 801e63de: d254 mov.l %1ea5 GetAutoPowerOffFlag, r2 801e64e8: d211 mov.l %1ea5 GetAutoPowerOffFlag, r2 801e6570: d28a mov.l %1ea5 GetAutoPowerOffFlag, r2 801e659c: d27f mov.l %1ea5 GetAutoPowerOffFlag, r2 <%12c2> 801e6632: d25a mov.l %1ea5 GetAutoPowerOffFlag, r2 <%12d2> 801e6b8a: d23f mov.l %1ea5 GetAutoPowerOffFlag, r2 801e6b9c: d23b mov.l %1839 PowerOff_OS, r2 <%1834 RTC_StartHalfSecondPeriodicInterrupt> 802ae7f0: d24c mov.l %1ea5 GetAutoPowerOffFlag, r2 <%1839 PowerOff_OS> 802aebf2: d276 mov.l %1ea5 GetAutoPowerOffFlag, r2 <%1e05 APP_SYSTEM_POWER> 8035c892: d255 mov.l %1e90 SetAutoPowerOffTime, r2 8035c8aa: d24f mov.l %1e90 SetAutoPowerOffTime, r2 Unknown: 8011b12c: d24e mov.l %1839 PowerOff_OS, r2 801563e0: d20d mov.l %1839 PowerOff_OS, r2 21 occurences found. ``` or even this ``` cg_3.60 @ 0x80000000> ss claims=false "mov.l .+PowerOff.+, r2" 8011b12c: d24e mov.l %1839 PowerOff_OS, r2 801563e0: d20d mov.l %1839 PowerOff_OS, r2 801768b2: d24a mov.l %1839 PowerOff_OS, r2 8018bbc6: d27f mov.l %1839 PowerOff_OS, r2 8018bf2e: d24a mov.l %1839 PowerOff_OS, r2 801df260: d253 mov.l %1ea5 GetAutoPowerOffFlag, r2 801df346: d21a mov.l %1ea5 GetAutoPowerOffFlag, r2 801e36fc: d268 mov.l %1839 PowerOff_OS, r2 801e5f16: d270 mov.l %1e91 GetAutoPowerOffTime, r2 801e5f2a: d26d mov.l %1e90 SetAutoPowerOffTime, r2 801e63de: d254 mov.l %1ea5 GetAutoPowerOffFlag, r2 801e64e8: d211 mov.l %1ea5 GetAutoPowerOffFlag, r2 801e6570: d28a mov.l %1ea5 GetAutoPowerOffFlag, r2 801e659c: d27f mov.l %1ea5 GetAutoPowerOffFlag, r2 801e6632: d25a mov.l %1ea5 GetAutoPowerOffFlag, r2 801e6b8a: d23f mov.l %1ea5 GetAutoPowerOffFlag, r2 801e6b9c: d23b mov.l %1839 PowerOff_OS, r2 802ae7f0: d24c mov.l %1ea5 GetAutoPowerOffFlag, r2 802aebf2: d276 mov.l %1ea5 GetAutoPowerOffFlag, r2 8035c892: d255 mov.l %1e90 SetAutoPowerOffTime, r2 8035c8aa: d24f mov.l %1e90 SetAutoPowerOffTime, r2 21 occurences found. ``` When claims=true (default), all addresses which are not claimed are put into an 'Unknown' section, and all addresses which are claimed are listed under their claimed function. This supports claims by syscalls and other functions. To implement this, I had to move the code from PrintPass' analyzeInstruction into analyzeInstructionFull, and add two options: (std::optional<std::string \*> output, bool print_syscall_names). If output is specified, instead of printing to stdout, the output from the print pass is put into that string. If print_syscall_names is false, the <%num name> won't be printed before syscalls. PrintPass.analyzeInstruction calls analyzeInstructionFull with output as an empty optional and print_syscall_names as true. The `ss` code is reasonably well commented, but if you have any questions, please ask. I am also unsure if `ss` is the right name for this. It was mentioned in the name planning issue, so I used it, but you may have a different idea for what that command would do, so please suggest a different name for this one if appropriate.
Owner

Sorry, haven't had time to review in detail yet. Broadly speaking: super useful idea, glad you made it! ss was intended to search for strings inside the binary itself. I guess we can call this ssd for "Search String in Disassembly".

I previously pondered implementing a more general grep mechanism that would allow you to search the output of any command. The basic idea is just to capture stdout while running a command and then filter it, ie. a bit more general than your approach on print, though the results might not be classified as well. Do you have an opinion on this approach?

Sorry, haven't had time to review in detail yet. Broadly speaking: super useful idea, glad you made it! `ss` was intended to search for strings inside the binary itself. I guess we can call this `ssd` for "Search String in Disassembly". I previously pondered implementing a more general grep mechanism that would allow you to search the output of any command. The basic idea is just to capture stdout while running a command and then filter it, ie. a bit more general than your approach on print, though the results might not be classified as well. Do you have an opinion on this approach?
Author
Collaborator

All good, there's no rush.

I like your idea about 'grepping' the stdout of any command. This is exactly what the script I based this on does.

The reason why this command doesn't do that is that I am pretty new to C++ and was unsure how, or if it would affect the speed, to capture stdout.

I think a command that 'greps' stdout would be good. Would you suggest that it would use some kind of pipe notation (|) like in bash, or would it just operate on the output of the last command? That would also require storing the command somewhere, so maybe just doing (for example) grep "d $:0x100" PowerOff would be better.

Do you think this command would still be useful with some kind of grep command, or would it no longer make sense to have it?

All good, there's no rush. I like your idea about 'grepping' the stdout of any command. This is exactly what the script I based this on does. The reason why this command doesn't do that is that I am pretty new to C++ and was unsure how, or if it would affect the speed, to capture stdout. I think a command that 'greps' stdout would be good. Would you suggest that it would use some kind of pipe notation (|) like in bash, or would it just operate on the output of the last command? That would also require storing the command somewhere, so maybe just doing (for example) `grep "d $:0x100" PowerOff` would be better. Do you think this command would still be useful with some kind of grep command, or would it no longer make sense to have it?
Owner

I think a command that 'greps' stdout would be good. Would you suggest that it would use some kind of pipe notation (|) like in bash, or would it just operate on the output of the last command?

There are several directions where we can go. My back-of-mind plan was to decouple analysis/computation and visualization. The best example of this is the h command - it is a pure visualization (hexdump) and some commands like s4 use it as a way to output their results. This is possible in the C++ API using the _h_hexdump() function which is a lot more general than the h command.

When thinking about fxos as a whole, there aren't that many output types for commands - lists of values, lists of ranges, disassemblies, and text probably cover most of them. I think there is an incentive to try and make commands output their results in some normalized form to allow composing commands.

Note that I'm not particularly advocating for composing commands in the shell language. I considered it, but that would probably require variables, with some types, and that'd just bloat an already scuffed piece of syntax. I'd rather write non-trivial tasks in the C++ API and leave only basic things to the shell. This means putting most features in the fxos library and just using shell commands as wrappers to get the results and call a suitable visualization function.

For instance, if you wanted to disassemble all of the functions that call %1839 PowerOff_OS, you would use a library function to obtain all xrefs to it, then look up claims to see what function it's in, then use the disassembly tool on the functions, and finally do an assembly visualization.

So what to do with the shell language? My gut feeling is leave data processing in the library and just put text processing in the shell. So for instance I wouldn't try to extend the shell language in a way that would allow it to connect s4 (which outputs a list of ranges) with h; that would need to be in the C++ code. But grepping is not really a data-oriented task and not API-friendly so I think it's fair to leave it here.

To sum up and clarify my idea:

  • Put text manipulation commands in the shell: | grep because it's so useful, | tee to save to a file (I always need that), maybe stuff like | less.
  • Write non-trivial analysis/computation code in C++, put it in lib/ and make it return data structures (makes it more reusable).
  • Write visualization code in shell/ but avoid doing analyis/computation there.

As far as ss is concerned: the core feature is a grep, so if we have a general | grep then it feels like it wouldn't be needed. That being said, the function names are quite important and I'm not clear whether we can get them easily with a grep. If we can't, then the command should stay.

Also, capturing stdout might be difficult, but we can also require shell commands to output with pre-selected functions instead of directly printf().

Does that make sense to you? Did you maybe have other ideas?

> I think a command that 'greps' stdout would be good. Would you suggest that it would use some kind of pipe notation (|) like in bash, or would it just operate on the output of the last command? There are several directions where we can go. My back-of-mind plan was to decouple analysis/computation and visualization. The best example of this is the `h` command - it is a pure visualization (hexdump) and some commands like `s4` use it as a way to output their results. This is possible in the C++ API using the `_h_hexdump()` function which is a lot more general than the `h` command. When thinking about fxos as a whole, there aren't that many output types for commands - lists of values, lists of ranges, disassemblies, and text probably cover most of them. I think there is an incentive to try and make commands output their results in some normalized form to allow composing commands. Note that I'm not particularly advocating for composing commands *in the shell language*. I considered it, but that would probably require variables, with some types, and that'd just bloat an already scuffed piece of syntax. I'd rather write non-trivial tasks in the C++ API and leave only basic things to the shell. This means putting most features in the fxos library and just using shell commands as wrappers to get the results and call a suitable visualization function. For instance, if you wanted to disassemble all of the functions that call `%1839 PowerOff_OS`, you would use a library function to obtain all xrefs to it, then look up claims to see what function it's in, then use the disassembly tool on the functions, and finally do an assembly visualization. So what to do with the shell language? My gut feeling is leave data processing in the library and just put text processing in the shell. So for instance I wouldn't try to extend the shell language in a way that would allow it to connect `s4` (which outputs a list of ranges) with `h`; that would need to be in the C++ code. But grepping is not really a data-oriented task and not API-friendly so I think it's fair to leave it here. To sum up and clarify my idea: - Put text manipulation commands in the shell: `| grep` because it's so useful, `| tee` to save to a file (I always need that), maybe stuff like `| less`. - Write non-trivial analysis/computation code in C++, put it in `lib/` and make it return data structures (makes it more reusable). - Write visualization code in `shell/` but avoid doing analyis/computation there. As far as `ss` is concerned: the core feature is a grep, so if we have a general `| grep` then it feels like it wouldn't be needed. That being said, the function names are quite important and I'm not clear whether we can get them easily with a grep. If we can't, then the command should stay. Also, capturing stdout might be difficult, but we can also require shell commands to output with pre-selected functions instead of directly `printf()`. Does that make sense to you? Did you maybe have other ideas?
Author
Collaborator

Yep, I agree. Standardising outputs would make things like ssd a lot easier to write.

I can start moving some things into lib/ soon; how much do you do you think should be done in the library? For example, should all of d go in the library, should the range and address parts be separate, or, should it be left how it is and disassemble should be moved to the library?

Regarding printing, would you use FxOS_log with a different level, or would it be a new command, e.g. FxOS_print?

Also, with | grep, | tee and | less, would you give them more fxos-like names, e.g. | os (output search), | of (output to file) and | op (output print)?

Yep, I agree. Standardising outputs would make things like `ssd` a lot easier to write. I can start moving some things into `lib/` soon; how much do you do you think should be done in the library? For example, should all of `d` go in the library, should the range and address parts be separate, or, should it be left how it is and `disassemble` should be moved to the library? Regarding printing, would you use FxOS_log with a different level, or would it be a new command, e.g. FxOS_print? Also, with `| grep`, `| tee` and `| less`, would you give them more fxos-like names, e.g. `| os` (output search), `| of` (output to file) and `| op` (output print)?
Owner

I can start moving some things into lib/ soon; how much do you do you think should be done in the library? For example, should all of d go in the library, should the range and address parts be separate, or, should it be left how it is and disassemble should be moved to the library?

That would be awesome, thanks!

From the _d function itself I would say that this disassembly loading bit doesn't belong in there, this is analysis-related.

The disassemble function is definitely on the edge. On one hand, it accesses fairly deep methods like analyzeAnonymousFunction() after instantiating a pass explictly. On the other hand, it would barely be useful as a library function because it's very specific and does nothing for "batch" disassembling since it takes only a single address as its input.

Finally, there's the question of whether d should use the virtual space's disassembly (the one populated by eg. ads) instead of using a temporary one. I think it should; that'd better fit the workflow of a reverse-engineer with one copy of the code that you study/annotate/enrich. But running disassembly passes will also modify the Disassembly object, which means we should at least have a mental model of who's allowed to edit the data and when.

Regarding printing, would you use FxOS_log with a different level, or would it be a new command, e.g. FxOS_print?

Since FxOS_log() is for developer logging, I guess another function. Ideally one in the FxOS namespace to keep things consistent (FxOS_log() is the only exception so far because it's a macro).

It's not a completely trivial problem, though. When printing we don't want to just accumulate everything into a string until the command finishes because commands can generate large outputs. We certainly want to process the text as it comes. The thing is it might not come in entire lines, which adds further finicky details.

Maybe the most effort-efficient approach is to do as the shell does; spawn a grep process, pipe our stdout to it, then run the command, and disconnect stdout afterwards. This would leave us with a single concern(stdout redirection) while enabling all of the juicy grep options that we're used to.

For naming, I'd say since this is text processing and not really fxos commands, we might as well capitalize on everyone's muscle memory and keep the original names :)

> I can start moving some things into lib/ soon; how much do you do you think should be done in the library? For example, should all of d go in the library, should the range and address parts be separate, or, should it be left how it is and disassemble should be moved to the library? That would be awesome, thanks! From the `_d` function itself I would say that [this disassembly loading bit](https://gitea.planet-casio.com/Lephenixnoir/fxos/src/commit/f16ecc370c16e20b213f3e27de2286ce396f7938/shell/d.cpp#L115-L117) doesn't belong in there, this is analysis-related. The `disassemble` function is definitely on the edge. On one hand, it accesses fairly deep methods like `analyzeAnonymousFunction()` after instantiating a pass explictly. On the other hand, it would barely be useful as a library function because it's very specific and does nothing for "batch" disassembling since it takes only a single address as its input. Finally, there's the question of whether `d` should use the virtual space's disassembly (the one populated by eg. `ads`) instead of using a temporary one. I think it should; that'd better fit the workflow of a reverse-engineer with one copy of the code that you study/annotate/enrich. But running disassembly passes will also *modify* the `Disassembly` object, which means we should at least have a mental model of who's allowed to edit the data and when. > Regarding printing, would you use FxOS_log with a different level, or would it be a new command, e.g. FxOS_print? Since `FxOS_log()` is for developer logging, I guess another function. Ideally one in the `FxOS` namespace to keep things consistent (`FxOS_log()` is the only exception so far because it's a macro). It's not a completely trivial problem, though. When printing we don't want to just accumulate everything into a string until the command finishes because commands can generate large outputs. We certainly want to process the text as it comes. The thing is it might not come in entire lines, which adds further finicky details. Maybe the most effort-efficient approach is to do as the shell does; spawn a `grep` process, pipe our stdout to it, then run the command, and disconnect stdout afterwards. This would leave us with a single concern(stdout redirection) while enabling all of the juicy grep options that we're used to. For naming, I'd say since this is text processing and not really fxos commands, we might as well capitalize on everyone's muscle memory and keep the original names :)
Author
Collaborator

From the _d function itself I would say that this disassembly loading bit doesn't belong in there, this is analysis-related.

The disassemble function is definitely on the edge. On one hand, it accesses fairly deep methods like analyzeAnonymousFunction() after instantiating a pass explictly. On the other hand, it would barely be useful as a library function because it's very specific and does nothing for "batch" disassembling since it takes only a single address as its input.

Finally, there's the question of whether d should use the virtual space's disassembly (the one populated by eg. ads) instead of using a temporary one. I think it should; that'd better fit the workflow of a reverse-engineer with one copy of the code that you study/annotate/enrich. But running disassembly passes will also modify the Disassembly object, which means we should at least have a mental model of who's allowed to edit the data and when.

I will probably move disassemble into lib/ (the function is somewhat useful, I would use it in ssd), and yes, using the main disassembly is probably best for performance as well.

It makes sense in my mind that the first command to add something at an address is the only one to add it. The logic of getInstructionAt(alllowDiscovery=true) is what I am thinking, and this would be the same for all disassembly metadata.

Since FxOS_log() is for developer logging, I guess another function. Ideally one in the FxOS namespace to keep things consistent (FxOS_log() is the only exception so far because it's a macro).

It's not a completely trivial problem, though. When printing we don't want to just accumulate everything into a string until the command finishes because commands can generate large outputs. We certainly want to process the text as it comes. The thing is it might not come in entire lines, which adds further finicky details.

Maybe the most effort-efficient approach is to do as the shell does; spawn a grep process, pipe our stdout to it, then run the command, and disconnect stdout afterwards. This would leave us with a single concern(stdout redirection) while enabling all of the juicy grep options that we're used to.

This might not be any better than capturing stdout, but could we have a print function which prints to stdout, and at the same time appends the text to a global buffer, which is reset after every semicolon or newline? The | syntax would then mean disabling printing and not resetting the buffer. This would mean that grep, less, tee could all just get the buffer and run with it.

> From the `_d` function itself I would say that [this disassembly loading bit](https://gitea.planet-casio.com/Lephenixnoir/fxos/src/commit/f16ecc370c16e20b213f3e27de2286ce396f7938/shell/d.cpp#L115-L117) doesn't belong in there, this is analysis-related. > > The `disassemble` function is definitely on the edge. On one hand, it accesses fairly deep methods like `analyzeAnonymousFunction()` after instantiating a pass explictly. On the other hand, it would barely be useful as a library function because it's very specific and does nothing for "batch" disassembling since it takes only a single address as its input. > > Finally, there's the question of whether `d` should use the virtual space's disassembly (the one populated by eg. `ads`) instead of using a temporary one. I think it should; that'd better fit the workflow of a reverse-engineer with one copy of the code that you study/annotate/enrich. But running disassembly passes will also *modify* the `Disassembly` object, which means we should at least have a mental model of who's allowed to edit the data and when. I will probably move `disassemble` into `lib/` (the function is somewhat useful, I would use it in `ssd`), and yes, using the main disassembly is probably best for performance as well. It makes sense in my mind that the first command to add something at an address is the only one to add it. The logic of `getInstructionAt(alllowDiscovery=true)` is what I am thinking, and this would be the same for all disassembly metadata. > Since `FxOS_log()` is for developer logging, I guess another function. Ideally one in the `FxOS` namespace to keep things consistent (`FxOS_log()` is the only exception so far because it's a macro). > > It's not a completely trivial problem, though. When printing we don't want to just accumulate everything into a string until the command finishes because commands can generate large outputs. We certainly want to process the text as it comes. The thing is it might not come in entire lines, which adds further finicky details. > > Maybe the most effort-efficient approach is to do as the shell does; spawn a `grep` process, pipe our stdout to it, then run the command, and disconnect stdout afterwards. This would leave us with a single concern(stdout redirection) while enabling all of the juicy grep options that we're used to. This might not be any better than capturing stdout, but could we have a print function which prints to stdout, and at the same time appends the text to a global buffer, which is reset after every semicolon or newline? The `|` syntax would then mean disabling printing and not resetting the buffer. This would mean that grep, less, tee could all just get the buffer and run with it.
Dr-Carlos force-pushed find-string from b40df64b8f to b969f48894 2023-08-26 23:22:53 +02:00 Compare
Dr-Carlos added 1 commit 2023-08-26 23:53:03 +02:00
Author
Collaborator

@Lephenixnoir From our discussions a couple months ago, there were a reasonable amount of changes suggested (a grep mechanism, pipes, moving shell functions to lib/, etc.). These are good ideas but probably deserve their own issue(s) and would have to be done after your changes in #14.

What do you think about merging ssd as it is (or with some changes) and then removing it later once a better mechanism is created?

@Lephenixnoir From our discussions a couple months ago, there were a reasonable amount of changes suggested (a grep mechanism, pipes, moving shell functions to `lib/`, etc.). These are good ideas but probably deserve their own issue(s) and would have to be done after your changes in #14. What do you think about merging `ssd` as it is (or with some changes) and then removing it later once a better mechanism is created?
Owner

Sorry, this PR completely flew off my radar. I agree, it's better to merge it now because there's zero ETA for the other features we discussed.

Few questions on the changes:

To implement this, I had to move the code from PrintPass' analyzeInstruction into analyzeInstructionFull, and add two options: (std::optional<std::string *> output, bool print_syscall_names).

Do you know whether direct output into a std::string is reasonable in performance for large outputs? Should we maybe use a stream instead and then instantiate with a string stream in _ssd ?

If print_syscall_names is false, the <%num name> won't be printed before syscalls.

Good idea for a parameter. Would it be possible to make that a PrintPass attribute like the promotion parameters?

Thanks again for your continued involvement!

Sorry, this PR completely flew off my radar. I agree, it's better to merge it now because there's zero ETA for the other features we discussed. Few questions on the changes: > To implement this, I had to move the code from PrintPass' analyzeInstruction into analyzeInstructionFull, and add two options: (std::optional<std::string *> output, bool print_syscall_names). Do you know whether direct output into a `std::string` is reasonable in performance for large outputs? Should we maybe use a stream instead and then instantiate with a string stream in `_ssd` ? > If print_syscall_names is false, the <%num name> won't be printed before syscalls. Good idea for a parameter. Would it be possible to make that a `PrintPass` attribute like the promotion parameters? Thanks again for your continued involvement!
Dr-Carlos added 1 commit 2023-08-27 12:32:12 +02:00
f58ab802d0 lib: use ostream in analyzeInstructionOutput out; add
print_syscall_names PrintPass attribute
Author
Collaborator

Do you know whether direct output into a std::string is reasonable in performance for large outputs? Should we maybe use a stream instead and then instantiate with a string stream in _ssd ?

I tried std::stringstream and it doesn't have a noticeable effect on speed (for me). Sticking with std::string and reserving 20k characters removed about 10-25 seconds on long outputs, but I decided to go with streams because this simplifies the logic compared to the current std::string implementation.

Good idea for a parameter. Would it be possible to make that a PrintPass attribute like the promotion parameters?

Yep, done. Let me know if you think this attribute should be used in other parts of the PrintPass.

Thanks again for your continued involvement!

No problem!

> Do you know whether direct output into a `std::string` is reasonable in performance for large outputs? Should we maybe use a stream instead and then instantiate with a string stream in `_ssd` ? I tried `std::stringstream` and it doesn't have a noticeable effect on speed (for me). Sticking with `std::string` and reserving 20k characters removed about 10-25 seconds on long outputs, but I decided to go with streams because this simplifies the logic compared to the current `std::string` implementation. > Good idea for a parameter. Would it be possible to make that a `PrintPass` attribute like the promotion parameters? Yep, done. Let me know if you think this attribute should be used in other parts of the `PrintPass`. > Thanks again for your continued involvement! No problem!
This pull request can be merged automatically.
This branch is out-of-date with the base branch
You are not authorized to merge this pull request.
You can also view command line instructions.

Step 1:

From your project repository, check out a new branch and test the changes.
git checkout -b Dr-Carlos-find-string master
git pull find-string

Step 2:

Merge the changes and update on Forgejo.
git checkout master
git merge --no-ff Dr-Carlos-find-string
git push origin master
Sign in to join this conversation.
No reviewers
No Label
No Milestone
No Assignees
2 Participants
Notifications
Due Date
The due date is invalid or out of range. Please use the format 'yyyy-mm-dd'.

No due date set.

Dependencies

No dependencies set.

Reference: Lephenixnoir/fxos#12
No description provided.