A Close Look At Controlling The MoarVM Profiler
This is slightly tangential to the Rakudo Perl 6 Performance Analysis Tooling grant from The Perl Foundation, but it does interact closely with it, so this is more or less a progress report.
The other day, Jonathan Worthington of Edument approached me with a little paid Job: Profiling very big codebases can be tedious, especially if you're interested in only one specific fraction of the program. That's why I got the assignment to make the profiler "configurable". On top of that, the Comma IDE will get native access to this functionality.
The actual request was just to allow specifying whether individual routines should be included in the profile via a configuration file. That would have been possible with just a simple text file that contains one filename/line number/routine name per line. However, I have been wanting something in MoarVM that allows much more involved configuration for many different parts of the VM, not just the profiler.
Obligatory cat photo.
That's why I quickly developed a small and simple "domain-specific language" for this and similar purposes.
The language had a few requirements:
- It should not be possible to write a program that loops infinitely
- It should have access to MoarVM internal data that is normally not accessible
- It shouldn't allow changing internal data so that things could break
- It should allow reasonably complex calculations and comparisons
There's also some things that aren't necessary:
- It doesn't need to be very pleasant to write
- It doesn't have to be succinct
- It doesn't have to have every feature under the sun
While thinking about what exactly I should build – before I eventually settled on building a "programming language" for this task – I bounced back and forth between the simplest thing that could possibly work (for example, a text file with a list of file/line/name entries) and the most powerful thing that I can implement in a sensible timeframe (for example, allowing a full NQP script). A very important realization was that as long as I require the first line to identify what "version" of configuration program it is, I could completely throw away the current design and put something else instead, if the need ever arises. That allowed me to actually commit to a design that looked at least somewhat promising. And so I got started on what I call
Here's an example program. It doesn't do very much, but shows what it's about in general:
version = 1 entry profiler_static: log = sf.name; profile = sf.name eq "calculate-strawberries" profile |= sf.cu.filename eq "CustomCode.pm6"
The entry decides which stage of profiling this program is applied to. In this case, the
profiler_static means we're seeing a routine for the first time, before it is actually entered. That's why only the information every individual invocation of the frame in question shares is available via the variable
sf, which stands for Static Frame. The Static Frame also allows access to the Compilation Unit (
cu) that it was compiled into, which lets us find the filename.
The first line that actually does something assigns a value to the special variable
log. This will output the name of the routine the program was invoked for.
The next line will turn on profiling only if the name of the routine is "calculate-strawberries". The line after that will also turn on profiling if the filename the routine is from is "CustomCode.pm6".
profiler_static, there are a couple more entry points available.
profiler_static, which runs the first time any routine is encountered, and stores the decision until the profile run is finished.
profiler_dynamicwill be evaluated every time a routine gets entered that would potentially be profiled otherwise (for example because a
profiler_staticprogram turned it on, or because there is no
profiler_staticprogram, which means that every routine is eligible for profiling).
Anything about the current execution context is available here, including the name of the current routine, the count and types of arguments passed, the routine that called this routine, and so on.
Information that seems less immediately useful like how many times has the GC run yet? or how many threads are there right now? is also available for very special cases.
heapsnapshotwill be run every time a GC run happens. It lets the user decide whether a GC run should result in a heap snapshot being taken or not, based on whether the run has been a minor or major collection, and arbitrary other factors, including time since start of the program, or time since the last heap snapshot was taken.
jitI'm not entirely sure about yet. I want to allow turning the specializer or the jit off completely for certain routines, but I also want to offer control over individual optimizations in spesh (turning off inlining for one particular routine, preventing scalar replacement but only for Rat types, ...) and behaviour of the jit (use the template jit for everything except one specific piece of one particular routine, ...).
How exactly this should look in the program, I do not know yet. It's also probably not going to be very interesting for end-users, but finding bugs in spesh and the JIT could be easier if you can pinpoint (even automatically!) the exact place something goes wrong down to "which optimization at what position makes the difference".
The syntax is still subject to change, especially before the whole thing is actually in a released version of MoarVM.
There is a whole lot of other things I could imagine being of interest in the near or far future. One place I'm taking inspiration from is where "extended Berkeley Packet Filter" (eBPF for short) programs are being used in the linux kernel and related pieces of software:
Oversimplifying a bit, BPF was originally meant for tcpdump so that the kernel doesn't have to copy all data over to the userspace process so that the decision what is interesting or not can be made. Instead, the kernel receives a little piece of code in the special BPF language (or bytecode) and can calculate the decision before having to copy anything.
eBPF programs can now also be used as a complex ruleset for sandboxing processes (with "seccomp"), to decide how network packets are to be routed between sockets (that's probably for Software Defined Networks?), what operations a process may perform on a particular device, whether a trace point in the kernel or a user-space program should fire, and so on.
So what's the status of
confprog? I've written a parser and compiler that feeds
confprog "bytecode" (which is mostly the same as regular moarvm bytecode) to MoarVM. There's also a preliminary validator that ensures the program won't do anything weird, or crash, when run. It is much too lenient at the moment, though. Then there's an interpreter that actually runs the code. It can already take an initial value for the "decision output value" (great name, isn't it) and it will return whatever value the
confprog has set when it runs. The heap snapshot profiler is currently the only part of MoarVM that will actually try to run a
confprog, and it uses the value to decide whether to take a snapshot or not.
Next up on the list of things to work on:
- There's barely any operators supported, like for comparing numbers, or arithmetic.
- There is not yet anything for floating point numbers, but I would like to offer support for that, since it's how we usually represent time.
- Values can not yet be persisted between runs, so you can't, for example, store the time of the last run, or how often a given decision was taken, etc.
- Printing to a log by assigning to a variable named "log" looks very odd, so the language will want to have syntax for function calls.
- Random decision making would be nice, for stochastic things.
Apart from improvements to the
confprog programming language, the integration with MoarVM lacks almost everything, most importantly installing a confprog for the profiler to decide whether a frame should be profiled (which was the original purpose of this assignment).
After that, and after building a bit of GUI for Comma, the regular grant work can resume: Flame graphs are still not visible on the call graph explorer page, and heap snapshots can't be loaded into moarperf yet, either.
Thanks for sticking with me through this perhaps a little dry and technical post. I hope the next one will have a little more excitement! And if there's interest (which you can signal by sending me a message on irc, or posting on reddit, or reaching me via twitter
@loltimo, on the Perl 6 discord server etc) I can also write a post on how exactly the compiler was made, and how you can build your own compiler with Perl 6 code. Until then, you can find Andrew Shitov's presentations about making tiny languages in Perl 6 on youtube.
I hope you have a wonderful day; see you in the next one!
PS: I would like to give a special shout-out to Nadim Khemir for the wonderful Data::Dump::Tree module which made it much easier to see what my parser was doing. Here's some example output from another simple
 @0 ├ 0 = .Label .Node @1 │ ├ $.name = heapsnapshot.Str │ ├ $.type = entrypoint.Str │ ├ $.position is rw = Nil │ └ @.children =  @2 ├ 1 = .Op .Node @3 │ ├ $.op = =.Str │ ├ $.type is rw = Nil │ └ @.children =  @4 │ ├ 0 = .Var .Node @5 │ │ ├ $.name = log.Str │ │ └ $.type = CPType String :stringy @6 │ └ 1 = String Value ("we're in") @7 ├ 2 = .Op .Node @8 │ ├ $.op = =.Str │ ├ $.type is rw = Nil │ └ @.children =  @9 │ ├ 0 = .Var .Node @10 │ │ ├ $.name = log.Str │ │ └ $.type = CPType String :stringy §6 │ └ 1 = .Op .Node @12 │ ├ $.op = getattr.Str │ ├ $.type is rw = CPType String :stringy §6 │ └ @.children =  @14 │ ├ 0 = .Var .Node @15 │ │ ├ $.name = sf.Str │ │ └ $.type = CPType MVMStaticFrame @16 │ └ 1 = name.Str ├ 3 = .Op .Node @17 │ ├ $.op = =.Str │ ├ $.type is rw = Nil │ └ @.children =  @18 │ ├ 0 = .Var .Node @19 │ │ ├ $.name = log.Str │ │ └ $.type = CPType String :stringy §6 │ └ 1 = String Value ("i am the confprog and i say:") @21 ├ 4 = .Op .Node @22 │ ├ $.op = =.Str │ ├ $.type is rw = Nil │ └ @.children =  @23 │ ├ 0 = .Var .Node @24 │ │ ├ $.name = log.Str │ │ └ $.type = CPType String :stringy §6 │ └ 1 = String Value (" no heap snapshots for you my friend!") @26 └ 5 = .Op .Node @27 ├ $.op = =.Str ├ $.type is rw = Nil └ @.children =  @28 ├ 0 = .Var .Node @29 │ ├ $.name = snapshot.Str │ └ $.type = CPType Int :numeric :stringy @30 └ 1 = Int Value (0) @31
Notice how it shows the type of most things, like
name.Str, as well as cross-references for things that appear multiple times, like the
String. Particularly useful is giving your own classes methods that specify exactly how they should be displayed by DDT. Love It!