Add ss
commmand to search for a string #12
Loading…
Reference in New Issue
No description provided.
Delete Branch "Dr-Carlos/fxos:find-string"
Deleting a branch is permanent. Although the deleted branch may continue to exist for a short time before it actually gets removed, it CANNOT be undone in most cases. Continue?
Hello!
This PR implements the
ss
command, allowing the user to search disassembly for a regex pattern (or just string).I think the best way of explaining it is to simply show an example output (without colour):
or
or this
or even this
When claims=true (default), all addresses which are not claimed are put into an 'Unknown' section, and all addresses which are claimed are listed under their claimed function. This supports claims by syscalls and other functions.
To implement this, I had to move the code from PrintPass' analyzeInstruction into analyzeInstructionFull, and add two options: (std::optional<std::string *> output, bool print_syscall_names).
If output is specified, instead of printing to stdout, the output from the print pass is put into that string. If print_syscall_names is false, the <%num name> won't be printed before syscalls.
PrintPass.analyzeInstruction calls analyzeInstructionFull with output as an empty optional and print_syscall_names as true.
The
ss
code is reasonably well commented, but if you have any questions, please ask.I am also unsure if
ss
is the right name for this. It was mentioned in the name planning issue, so I used it, but you may have a different idea for what that command would do, so please suggest a different name for this one if appropriate.Sorry, haven't had time to review in detail yet. Broadly speaking: super useful idea, glad you made it!
ss
was intended to search for strings inside the binary itself. I guess we can call thisssd
for "Search String in Disassembly".I previously pondered implementing a more general grep mechanism that would allow you to search the output of any command. The basic idea is just to capture stdout while running a command and then filter it, ie. a bit more general than your approach on print, though the results might not be classified as well. Do you have an opinion on this approach?
All good, there's no rush.
I like your idea about 'grepping' the stdout of any command. This is exactly what the script I based this on does.
The reason why this command doesn't do that is that I am pretty new to C++ and was unsure how, or if it would affect the speed, to capture stdout.
I think a command that 'greps' stdout would be good. Would you suggest that it would use some kind of pipe notation (|) like in bash, or would it just operate on the output of the last command? That would also require storing the command somewhere, so maybe just doing (for example)
grep "d $:0x100" PowerOff
would be better.Do you think this command would still be useful with some kind of grep command, or would it no longer make sense to have it?
There are several directions where we can go. My back-of-mind plan was to decouple analysis/computation and visualization. The best example of this is the
h
command - it is a pure visualization (hexdump) and some commands likes4
use it as a way to output their results. This is possible in the C++ API using the_h_hexdump()
function which is a lot more general than theh
command.When thinking about fxos as a whole, there aren't that many output types for commands - lists of values, lists of ranges, disassemblies, and text probably cover most of them. I think there is an incentive to try and make commands output their results in some normalized form to allow composing commands.
Note that I'm not particularly advocating for composing commands in the shell language. I considered it, but that would probably require variables, with some types, and that'd just bloat an already scuffed piece of syntax. I'd rather write non-trivial tasks in the C++ API and leave only basic things to the shell. This means putting most features in the fxos library and just using shell commands as wrappers to get the results and call a suitable visualization function.
For instance, if you wanted to disassemble all of the functions that call
%1839 PowerOff_OS
, you would use a library function to obtain all xrefs to it, then look up claims to see what function it's in, then use the disassembly tool on the functions, and finally do an assembly visualization.So what to do with the shell language? My gut feeling is leave data processing in the library and just put text processing in the shell. So for instance I wouldn't try to extend the shell language in a way that would allow it to connect
s4
(which outputs a list of ranges) withh
; that would need to be in the C++ code. But grepping is not really a data-oriented task and not API-friendly so I think it's fair to leave it here.To sum up and clarify my idea:
| grep
because it's so useful,| tee
to save to a file (I always need that), maybe stuff like| less
.lib/
and make it return data structures (makes it more reusable).shell/
but avoid doing analyis/computation there.As far as
ss
is concerned: the core feature is a grep, so if we have a general| grep
then it feels like it wouldn't be needed. That being said, the function names are quite important and I'm not clear whether we can get them easily with a grep. If we can't, then the command should stay.Also, capturing stdout might be difficult, but we can also require shell commands to output with pre-selected functions instead of directly
printf()
.Does that make sense to you? Did you maybe have other ideas?
Yep, I agree. Standardising outputs would make things like
ssd
a lot easier to write.I can start moving some things into
lib/
soon; how much do you do you think should be done in the library? For example, should all ofd
go in the library, should the range and address parts be separate, or, should it be left how it is anddisassemble
should be moved to the library?Regarding printing, would you use FxOS_log with a different level, or would it be a new command, e.g. FxOS_print?
Also, with
| grep
,| tee
and| less
, would you give them more fxos-like names, e.g.| os
(output search),| of
(output to file) and| op
(output print)?That would be awesome, thanks!
From the
_d
function itself I would say that this disassembly loading bit doesn't belong in there, this is analysis-related.The
disassemble
function is definitely on the edge. On one hand, it accesses fairly deep methods likeanalyzeAnonymousFunction()
after instantiating a pass explictly. On the other hand, it would barely be useful as a library function because it's very specific and does nothing for "batch" disassembling since it takes only a single address as its input.Finally, there's the question of whether
d
should use the virtual space's disassembly (the one populated by eg.ads
) instead of using a temporary one. I think it should; that'd better fit the workflow of a reverse-engineer with one copy of the code that you study/annotate/enrich. But running disassembly passes will also modify theDisassembly
object, which means we should at least have a mental model of who's allowed to edit the data and when.Since
FxOS_log()
is for developer logging, I guess another function. Ideally one in theFxOS
namespace to keep things consistent (FxOS_log()
is the only exception so far because it's a macro).It's not a completely trivial problem, though. When printing we don't want to just accumulate everything into a string until the command finishes because commands can generate large outputs. We certainly want to process the text as it comes. The thing is it might not come in entire lines, which adds further finicky details.
Maybe the most effort-efficient approach is to do as the shell does; spawn a
grep
process, pipe our stdout to it, then run the command, and disconnect stdout afterwards. This would leave us with a single concern(stdout redirection) while enabling all of the juicy grep options that we're used to.For naming, I'd say since this is text processing and not really fxos commands, we might as well capitalize on everyone's muscle memory and keep the original names :)
I will probably move
disassemble
intolib/
(the function is somewhat useful, I would use it inssd
), and yes, using the main disassembly is probably best for performance as well.It makes sense in my mind that the first command to add something at an address is the only one to add it. The logic of
getInstructionAt(alllowDiscovery=true)
is what I am thinking, and this would be the same for all disassembly metadata.This might not be any better than capturing stdout, but could we have a print function which prints to stdout, and at the same time appends the text to a global buffer, which is reset after every semicolon or newline? The
|
syntax would then mean disabling printing and not resetting the buffer. This would mean that grep, less, tee could all just get the buffer and run with it.b40df64b8f
tob969f48894
@Lephenixnoir From our discussions a couple months ago, there were a reasonable amount of changes suggested (a grep mechanism, pipes, moving shell functions to
lib/
, etc.). These are good ideas but probably deserve their own issue(s) and would have to be done after your changes in #14.What do you think about merging
ssd
as it is (or with some changes) and then removing it later once a better mechanism is created?Sorry, this PR completely flew off my radar. I agree, it's better to merge it now because there's zero ETA for the other features we discussed.
Few questions on the changes:
Do you know whether direct output into a
std::string
is reasonable in performance for large outputs? Should we maybe use a stream instead and then instantiate with a string stream in_ssd
?Good idea for a parameter. Would it be possible to make that a
PrintPass
attribute like the promotion parameters?Thanks again for your continued involvement!
I tried
std::stringstream
and it doesn't have a noticeable effect on speed (for me). Sticking withstd::string
and reserving 20k characters removed about 10-25 seconds on long outputs, but I decided to go with streams because this simplifies the logic compared to the currentstd::string
implementation.Yep, done. Let me know if you think this attribute should be used in other parts of the
PrintPass
.No problem!
Step 1:
From your project repository, check out a new branch and test the changes.Step 2:
Merge the changes and update on Forgejo.