Can you say that again? You can say that again!

Can you say that again? You can say that again!
Photo by Bioscience Image Library by Fayette Reynolds / Unsplash

A few weeks ago I was chatting with coralina and she linked me 4:19 of The Zipf Mystery but every time he repeats a word it loops.

It's an instance of a meme format I don't think I had seen before. The basic conceit is, as the title states, every time a word that has been said before is said again, the video loops back to that time.

As I interpret it, the instances of words inside of looped sections don't count for determining the "last time" each word has been said, though it's only a little extra work to implement that interpretation as well.

As a Rakunaut it of course didn't take me long to make an attempt at implementing a script that turns a bit of text into the "but every time it repeats a word it loops it repeats the text in between".

After a bit of hacking, I fed the introductory paragraph of the raku website into my script and got the following result. Each time a repetition is done, the word that caused the repetition is printed in red blue, and the repeated text is printed in green:

Hi, my name is Camelia.  I'm the spokesbug for the spokesbug for the
Raku Programming language.  Raku Programming language. Raku has been
developed by a team of dedicated and enthusiastic open source
developers and enthusiastic open source developers and continues to be
developed. by a team of dedicated and enthusiastic open source
developers and continues to be developed. You can help too.  The Raku
Programming language. Raku has been developed by a team of dedicated
and enthusiastic open source developers and continues to be developed.
You can help too. The only requirement is Camelia. I'm the spokesbug
for the Raku Programming language. Raku has been developed by a team
of dedicated and enthusiastic open source developers and continues to
be developed. You can help too. The only requirement is that you can
help too. The only requirement is that you know how to be developed.
You can help too. The only requirement is that you know how to be
developed. You can help too. The only requirement is that you know how
to be nice to be nice to all kinds of dedicated and enthusiastic open
source developers and continues to be developed. You can help too. The
only requirement is that you know how to be nice to all kinds of
people (and continues to be developed. You can help too. The only
requirement is that you know how to be nice to all kinds of people
(and butterflies).  Go to all kinds of people (and butterflies). Go to
#raku has been developed by a team of dedicated and enthusiastic open
source developers and continues to be developed. You can help too. The
only requirement is that you know how to be nice to all kinds of
people (and butterflies). Go to #raku (irc.libera.chat) and
butterflies). Go to #raku (irc.libera.chat) and someone will be nice
to all kinds of people (and butterflies). Go to #raku
(irc.libera.chat) and someone will be glad to #raku (irc.libera.chat)
and someone will be glad to help too. The only requirement is that you
know how to be nice to all kinds of people (and butterflies). Go to
#raku (irc.libera.chat) and someone will be glad to help you know how
to be nice to all kinds of people (and butterflies). Go to #raku
(irc.libera.chat) and someone will be glad to help you get started.

There's definitely some funny bits in there. My favorites include:

You can help too. The only requirement is Camelia.
You can help too. The only requirement is that you can
help too. The only requirement is that you know how to be developed.
Go to #raku (irc.libera.chat) and someone will be glad to help you know how to be nice to all kinds of people (and butterflies).

I think I might make a recording of reading the text and edit it to do the correct looping, maybe I'll see if Whisper can give precise per-word timestamps that I could turn into a command line with sox or ffmpeg to create the final result.

But for now, I'll go through the actual code I used for this. You can already look at and play with the final version on Compiler Explorer here.

The version I linked to on Compiler Explorer begins with a tiny implementation of Terminal::ANSIColor's sub colored:

sub colored($what, $_) {
   when "green" {
       "\e[31m" ~ $what ~ "\e[0m"
   }
   when "red" {
       "\e[32m" ~ $what ~ "\e[0m"
   }
}

The alternative is of course to use Terminal::ANSIColor, but Compiler Explorer doesn't have raku libraries yet. For this case it doesn't really matter that it only supports green and red, and trying to choose any other color just makes no text come out at all 🤔

Oh and to top it off, I accidentally switched the codes for green and red around in the sub, and I have the same switch-around in the code that uses the sub, so both mistakes cancel each other out here. Don't look too closely, haha cat emoji looking nervous and getting booped on the nose

Next is the text we want to put in. Since I was prototyping this with my code editor (vim) and executing it again after making a change, I didn't want to paste the source text in every time. For that reason, the input text is part of the source file, instead of reading from $*IN (aka stdin). It could have gone into a separate file as well just as easily with my @input = "text.txt".IO.words for example.

my @input = Q[
Hi, my name is Camelia. I'm the spokesbug for the
Raku Programming language. Raku has been developed
by a team of dedicated and enthusiastic open source
developers and continues to be developed. You can
help too. The only requirement is that you know how
to be nice to all kinds of people (and butterflies).
Go to #raku (irc.libera.chat) and someone will be
glad to help you get started.
].words;

I chose the Q quoting construct here with square brackets because square brackets aren't in the source text, but using heredocs with Q:to/INPUT-TEXT/ for example would have been just as clean.

In that case, the .words can go directly after the Q while the input text goes below, with indentation if you like, followed by a line with just INPUT-TEXT in it. The .words method makes line wrapping and indentation in the output

Next up, we do a loop over the input array. Using the .pairs method on the array will give us a Pair object each iteration that has a .key with the index of the item and a .value of the word in question.

The result of the for loop goes directly into a result variable. For that purpose, we take the for that by itself is a statement and adapt it into an expression with the do prefix. That lets us put the result of every iteration directly into our array:

my @result = do for @input.pairs {

You can see that instead of giving a variable to put the pair object into, we just use the default, which is $_, the "topic variable". This lets us refer to .key and .value just like that.

Next up, inside the for loop we declare a state variable to hold information about words we've seen already. A state variable behaves like a variable you declared outside of the loop in terms of keeping values from one round to the next, but is only visible inside of the curly braces. I find that this makes it a bit clearer where the variable belongs. After the loop it is no longer relevant, and trying to address it there is just a case of "undeclared variable".

   state %last;

I mentioned earlier that the .words method gives us a list of consecutive non-whitespace, and that includes punctuation. We don't want the punctuation to be counted when looking up when a word was seen the last time, and also want to count capitalized and lower cased versions of words as the same, so we normalize the words before looking them up or storing them in our %last hash:

   my $keyword = .value.comb(/<alpha>/).join("").fc;

We use a simple regex with the .comb method that gives us every alphabetical character from the input, joins them into one string without spaces, and turns it into fold-case (it's kind of like lower case, but different for some scripts.)

The next few lines set up the logic to put the index we saw the word at into the hash. Since we want to get whatever was already in the hash before we assign the new value, we have a few ways to make that happen, but the implementation I chose here is a LEAVE block, which is executed when the body of the loop has finished.

I made the choice to use LEAVE rather than just putting the code at the end of the block because I'm also using the last statement of the loop body to give the value that the for loop puts into the result list.

The block itself is pretty straight-forward:

   LEAVE {
       %last{$keyword} = .key;
   }

When leaving the for block, we set the value in %last for the $keyword to the .key, i.e. the index of the word from the input list.

We're almost done! 👍

We now want to grab a "previous position" from the hash, if it exists, and make the repetition happen. Otherwise, the word just goes straight through to the result:

   with %last{$keyword} -> $prevp {
       colored(.value, "green"),
         @input[$prevp ^.. .key].map({ colored($_, "red") })
   } else -> $nothing {
       .value;
   }

The with construct lets us check a value for definedness and assign it into a variable for the block.

The if statement can do the same variable assignment, but it checks for truth value. The very first word in our array would have the index 0, which would count as False, and not execute the block.

So for the result of our iteration in case there is a previous position for our keyword should be the word itself, followed by the repeated content. $prevp is the index where the current word was seen before and .key will give us the current index. We use ^.. which creates a Range just like .., but skips the first value.

We use colored for the .value as well as every word we copied out of the @input list with the [] postcircumfix operator to make the first word green and the copied words red respectively.

Now that I've looked at the code again and again for writing this post, it occurs to me that there's not really a good reason to pass every word individually through the colored sub with a map. Instead, I could have turned the list I took out of the @input array into a String joined by spaces, which is conveniently exactly what the .Str method on it would do. That can then be fed into colored and we've saved maybe a third of the whole line.

Ah well, what can you do! I'm not really golfing the code down to the shortest it could possibly be. It would probably look a bit different if I did cute cat emoji holding a soldering iron

For the case where we didn't actually have an entry in the %last hash yet for the $keyword we would land in the else branch of this construct. We take the value we got into a named variable so that our $_ doesn't get scribbled over. We could still refer to the $_ from the outer block with $OUTER::_ but I thought that's less pleasing.

All that this block needs to do is get the .value out from the pair so it's just the word, and it's done!

Here's the whole loop in one uninterrupted piece:

my @result = do for @input.pairs {
   state %last;
   my $keyword = .value.comb(/<alpha>/).join("").fc;
   LEAVE {
       %last{$keyword} = .key;
   }
   with %last{$keyword} -> $prevp {
       colored(.value, "green"),
         @input[$prevp ^.. .key].map({ colored($_, "red") })
   } else -> $nothing {
       .value;
   }
}

Now all that's left is to print it out to the terminal.

Just putting the text on the screen as one long string doesn't look good, so I want it word-wrapped. There is a method called naive-word-wrapper on the Str class, however it is marked is implementation-detail.

What that means is that we get no guarantees that it will stay around, or behave the same on a different version of rakudo. It's also not expected to be present on other implementations of Raku. For this use case, I think it's totally fine. If the method is gone, we can just output the string without any wrapping of words, and maybe expect our caller to pipe it through some program that does word wrapping.

say @result.join(" ").naive-word-wrapper(:70max);

Incidentally, when trying that out, I found that neither fmt nor par understand that ANSI color formatting codes have zero visible width when printed 🫣

Even though the naive-word-wrapper implements greedy line wrapping like fmt rather than an algorithm that tries to find a globally optimal solution for how many words should go on each line which par has, the result still looks a lot more correct since it actually strips color formatting codes before doing its calculations 👍

Again, you can copy out or play with the whole code, put in your own input text, try to make the code shorter, or whatever you like by following this link to Compiler Explorer.

Normally I'd tell you to leave a comment if you liked the post, but I haven't set up anything yet that would make that easy. Maybe soon I will have the experimental Ghost ActivityPub thing running? But until then, you can reply to this toot.

If you don't have an account that can post to the fediverse, you can also find me on IRC, on the raku mailing list, and if there's a discussion on one of the typical social media discussion sites I might see it.

I hope you'll come back when I publish my next post! Don't forget this blog has an RSS feed 😉

This page uses the neocat emoticons by Volpeon under CC-BY-NC-SA-4.0