July 26, 2018 · grant Instrumented Profiler

Wow, check out this garbage

Hello everyone! It's been more than a month since the last report, but I've been able to put hours in and get code out. And now I'll show you what's come out of the last weeks.

The Garbage Collector

One important aspect of dynamic languages is that memory is managed for the user. You don't usually see malloc and free in your perl code. But objects are constantly being created, often "in the background". Asking the user to take care of freeing up space again for objects that may have been created and become obsolete in the same line of code sounds like a good recipe for user dissatisfaction.

To take this potential full-time-job off the user's hands, MoarVM employs Garbage Collection, specifically a scheme based on "reachability". Whenever space needs to be freed up, the process begins. Starting from a set of objects that MoarVM happens to know for sure are currently needed, references from one object to another are followed until everything "reachable" has been reached and "marked" for being kept. Afterwards, everything that has been created but not "marked" will be removed in a step commonly called "sweeping". Garbage Collectors following this scheme are called "Mark & Sweep Garbage Collectors". Find a section about this on the "Tracing Garbage Collection" wikipedia article, though MoarVM's garbage collector has a few additional tweaks for performance reasons.

Naturally, going through a large amount of objects consumes time. Seeing what the program spends its time doing is an important part of making it perform better, of course. That's why the MoarVM profiler records data about the Garbage Collector's activities.

GC in the profiler

The old HTML profiler frontend already showed you how many GC runs happened during your program's run. For each time the GC ran you'll see how long the run took, whether it went through the nursery only (a "minor" collection) or everything on the heap (a "major" collection, called "full" in the tools). I recently added a column displaying when the run started in milliseconds from the start of recording. However, the old profiler frontend didn't do anything with information recorded by threads other than the main thread. Here's a screenshot from the old frontend:

moar_profiler_gc_screenshot-1

I'm not quite sure why it says it cleared about 4 gigabytes of data during the first two runs, but that'll go on the pile of things to check out later.

For now, I can use this to explain a few things before I go on to show the first version of the new interface and what it does with multithreaded programs.

The profile was taken from a hypered (i.e. automatically multithreaded) implementation of the fannkuch benchmark. The rightmost column shows how much data was copied over from the nursery to the next nursery, how much was taken into the "old generation" (either called "old" or "gen2"), and how much was freed.

There's also a count of how many references there were from objects in the old generation to objects in the nursery, the "gen2 roots". This happens when you keep adding objects to a list, for example. At some point the list becomes "old" and fresh objects that are inserted into it have to be specifically remembered so that they are considered reachable, even if the old objects aren't fully analyzed.

The new frontend

moarperf_gc_screenshot

Looking at the next screenshot, which shows the GC section of the new profiler frontend I've been working on, you'll see it looks very different. The existing information is re-organized. Instead of a column of "time taken" with a bar and "time since start", this info is now in simple text columns in addition to bar charts at the top. You can now expand individual GC runs to get at the other information from the previous profiler: The amounts kept, promoted, and freed, as well as the inter-generational roots. However, they are now per-thread, and there's an additional bar chart. It's lacking a legend or title right now, but if it had one, it'd say that it's the individual start and end times of each thread's GC work, in milliseconds. The little asterisk denotes which thread was responsible for kicking off the GC run.

You'll surely notice another difference: There's a whole lot more bars in the top bar chart than there were entries in the GC table in the old profiler frontend. The simplest explanation is that I just took a profile from a different program. However, it was the same code that generated these two profiles. The difference comes from the fact that the old frontend currently only displays the GC runs it finds in the first thread's list. That corresponds to entries in the new list that feature 1 in the "threads" column.

The near future

There's many more things I want to do with this tab, like getting statistics on a per-thread rather than per-run level, and maybe a "participation tally" graph that'd show when which threads participated. And there's details shown by the graphs that I haven't noticed before. For example, what causes some threads to join in near the end, and does that force the other threads to wait for a long time before doing anything at all? The screenshot below is an example of that pattern.

moarperf_gc_screenshot_2

The questions raised by this fresh perspective on the data we've already been capturing long ago has already made it worth quickly shoving the GC tab in between the other parts of the frontend. I had already started the Routine Overview tab, but it's not quite ready to be shown off. I hope it'll be pretty by the time my next report comes out.

Thanks to The Perl Foundation for the grant, and masak for this beautiful closing paragraph:

In conclusion, all of the people who oppose my plan for world domination will be treated in the most appropriate manner. I wish nothing but the best for my enemies. May you rest easily at night, and may your futures bloom and burgeon!

- Timo

A Perl Programming Language blog.