cg::71617

g | x | w | all

Bytes	Lang	Time	Link
nan		250125T173431Z	Unrelate
nan		250125T164426Z	Unrelate
nan		230723T000629Z	Neil
nan		231124T170447Z	Neil
nan		211218T100740Z	Neil
001	Arithmetic in Retina	210207T222511Z	Neil
nan		160229T193153Z	Martin E
nan		160212T193333Z	Martin E

Abuse anchors to fail

The standard idiom in .NET regex for enforcing that a group has no matches, typically in conjunction with balancing groups, is (?(Group)(?!)), using a negative lookahead matching an empty string (?!) to conditionally fail the match (because it's impossible not to match an empty string). However, it is contextually almost always possible to have a guaranteed failure in fewer bytes by using an anchor which cannot match at any relevant position in the string.

In the majority of cases, this is simply ^--if there's something to have possibly matched, then the rest of your regex has most likely structurally guaranteed that the check is not occurring at the beginning of the string. When matching right-to-left (as with the r flag), substitute $. Things can get a little thorny using the multiline flag which allows ^$ to match at line beginnings/ends, and therefore in the interior of a string, but since they're still zero-width (and do not match "past" newlines or match the newline characters themselves), this isn't a problem without explicitly matching newlines or using the singleline flag to match them with .--and if that is done, I'm not aware of any circumstance (except something highly contrived where the group being checked for non-matching is at the beginning of the match and can conditionally have a zero-width match due to containing lookaheads) where \A or \z to match the strict beginning/end of the entire string doesn't work, or even \G since overlapping matches with v or w exclusively match \G at the starting position instead of at the ends of previous matches.

I was inspired by the failure idiom in Regenerate to use undefined backreferences (which error in .NET), but it turns out this has been established practice for a while and there is already a general regex tip for this general idea.

Getting sorted lengths

Written up on Neil's suggestion

The shortest way to sort lines by length is straightforwardly:

N$`
$.&

However, if the only information needed afterwards about the lines is their length, then it saves 1 byte to neutralize the identity of every non-newline character then sort those results as strings without a key:

.
.
O`

Horizontal or Vertical Mirroring in Retina 0.8.2

If the amount of mirroring is a constant, then it's easy enough: duplicate the string, then reverse the appropriate half. For instance, suppose the buffer is known to contain 10 characters, and you want to mirror them about the last character:

saippuakiv

The following program would then achieve that:

.$
$&$`
10>O$^`.

Try it online! Explanation: The first stage copies the first 9 characters to the end of the line, then the second stage reverses the characters after the first 10. Note that the second stage consists of two lines but the second (empty) line may be omitted if it's the last stage.

If you want to mirror to the left, then you can prepend the suffix and sort the first 9 characters instead.

^.
$'$&
9O$^`.

Try it online! (Mirrors vikauppias to the left.)

Similarly, you can mirror a fixed number of lines. Suppose for instance you wanted a mirror copy of all three lines:

| | | | |
=~=~=~=~=
#########

The following program would then achieve that:

$
¶$`
3>O$^`

Try it online! Explanation: The first stage duplicates the entire buffer, while the second stage reverses the lines after the first three.

If you want to mirror up, then remove the >, so the first 3 lines are reversed instead.

If the amount of mirroring is a variable, then it might still be possible, but you would need to use .NET balancing groups to ensure that only the characters or lines that need to be reversed are processed by the O command.

Horizontal Mirroring in Retina 1

Horizontal mirroring is straightforward in Retina 1: simply use the $^ function to reverse a substitution. For instance, to horizontally mirror saippuakiv as above:

.$
$^$=

Try it online! Explanation: The reversed input is appended to its prefix, thus mirroring the input. (The code would be even simpler if the last character could be duplicated as well.)

Vertical Mirroring of Horizontally Mirrored Text in Retina 1

If you are mirroring horizontally mirrored text, then a very similar approach will allow you to vertically mirror the whole text.

$
¶$^$`

Try it online! Explanation: The whole buffer is duplicated, but the second copy is reversed. This only works because the horizontal mirroring turns the string reversal into a line reversal.

Both of these examples are readily adapted to mirror to the left or up.

If the text is not already horizontally mirrored then you will need to adapt the Retina 0.8.2 examples (note that the limits syntax changes so that 3> is now 3, and 3 is now ,2). You can use the V command instead to avoid the second line of the stage: 3,V0^ instead of 3,O$^ (the 0 is needed to avoid reversing the characters of the lines).

Actualising a transliteration pattern in Retina 0.8.2

In Retina 1, it's easy to produce the contents of a transliteration pattern, here the 64 characters used by MIME Base-64:


    64*
    Y`\_`Lld+/

Try it online! Explanation: The first stage inserts 64 _s, and the second stage cyclically transliterates _s to the 64 base 64 characters in turn. You can also cyclically extend the string should you want, say, 10 copies of each of the digits; just cyclically translate 100 _s into digits.

But it turns out not to be that difficult in Retina 0.8.2. Here is the required program:

T`Lld+/`_o
}`$
/

Try it online! Explanation: Each pass through the loop, the characters in the string get transliterated through the contents of the pattern, with the first character of the pattern being deleted, and a new copy of the last character is appended.

The loop becomes idempotent when the buffer equals the contents of the pattern, as then the deletion of the first character and the append of the last character is balanced by the transliteration of the intervening characters.

You can generate a prefix of a pattern by using the appropriate ending character in the append stage.

The code as shown is designed to be used to produce input as part of a larger program, which is why it does not begin with a loop block; it would need an extra { if it was embedded within a larger program. If the code actually appears at the end, or it is the desired program, then you can go on to remove the trailing loop block instead, saving a byte.

Note that when embedded within a larger program the code will normally delete any other copies of the output pattern present within the work buffer while appending a copy of the output pattern. You may be able to avoid the deletion by use of a suitable pattern to restrict the scope of the transliteration. It is also possible to adapt the code to prefix rather than append the desired output.

Using balancing groups to capture the same count twice.

Occasionally you need to capture a balancing group twice, such as to match the substring nice when the letters are equally spaced in the input. The naive approach is to capture the characters between n and i twice, and then use balancing groups to ensure that the same number of character exist between i and c and c and e, like this:

n((.))*i(?<-1>.)*(?(1)^)c(?<-2>.)*(?(2)^)e

However, you can capture into capture group 2 while popping from capture group 1, allowing you to then pop the same count from capture group 2 later, saving a byte:

n(.)*i(?<2-1>.)*(?(1)^)c(?<-2>.)*(?(2)^)e

(Normally you would use (?!) to force the match to fail if the count is too low but obviously in code golf we just use the shortest regex that we know will fail; ^ usually works.)

Arithmetic in Retina 1

In Retina you need to do your arithmetic in unary. However, Retina 1 makes things slightly easier for you in certain cases, principally because of two constructs:

The repetition operator * accepts an expression as its RHS, whereas the $* operator in Retina 0.8.2 only accepted a character as its RHS
Retina 1 now includes a $(...) grouping construct, which accepts the length modifier $.(...)

It thus becomes possible to perform some arithmetic operations in both unary or decimal, depending on which is convenient. In particular, the group length construct $.(...) calculates the sum of the lengths of its contents, taking any products using arbitrary precision integers, so it's not limited in value.

Addition

This is straightforward.

Convert any decimal numbers to unary using *_
Concatenate the numbers to give a unary result
Group the sum using $(...) if you intend to multiply it or $.(...) to convert it to decimal

2-argument examples:

Unary + Unary = Unary: $1$2
Unary + Unary = Decimal: $.($1$2)
Unary + Decimal = Unary: $1$2*_
Unary + Decimal = Decimal: $.($1$2*) (see below)
Decimal + Decimal = Unary: $1*_$2*_
Decimal + Decimal = Decimal: $.($1*_$2*)

Multiplication

Note that * normally takes one decimal and one unary parameter and returns a unary result.

If the numbers are all decimal, add an extra unary parameter of 1
Convert all but one number to decimal using the length modifier
Take the product to give a unary result
Convert the product to decimal by enclosing in $.(...) if desired

2-argument examples:

Unary * Unary = Unary: $.1*$2
Unary * Unary = Decimal: $.($.1*$2)
Decimal * Unary = Unary: $1*$2
Decimal * Unary = Decimal: $.($1*$2)
Decimal * Decimal = Unary: $1*$2*_
Decimal * Decimal = Decimal: $.($1*$2*) (see below)

A Retina replacement string to take the cube of a matched decimal number would be $.(*** (see below).

Further savings

Only the first non-negative integer contained in the the LHS of a * operator is significant; there is no limit on junk characters (including -s). In particular this makes it easier to use $&.
If the LHS of a * is indeed $&, then it is not necessary at the start of a $( construct or replacement string or after another *. Example: $.(*__)
The _ after a * is not necessary at the end of a $( construct or replacement string. Example: $.($1*_$2*)
It is not necessary to close your $( constructs if they end at the end of the replacement string. Example: $.($1$2
When adding the constant values 1 or 2 it is shorter to express them in unary.

Splitting strings into chunks of equal length `n`

As in most "normal" languages TMTOWTDI (there's more than one way to do it). I'm assuming here that the input doesn't contain linefeeds, and that "splitting" means splitting it into lines. But there are two quite different goals: if the length of the string isn't a multiple of the chunk length, do you want to keep the incomplete trailing chunk or do you want to discard it?

Keeping an incomplete trailing chunk

In general, there are three ways to go about the splitting in Retina. I'm presenting all three approaches here, because they might make a bigger difference when you try to adapt them to a related problem. You can use a replacement and append a linefeed to each match:

.{n}
$&¶

That's 8 bytes (or a bit less if n = 2 or n = 3 because then you can use .. or ... respectively). This has one issue though: it appends an additional linefeed if the string length is a multiple of the chunk length.

You can also use a split stage, and make use of the fact that captures are retained in the split:

S_`(.{n})

The _ option removes the empty lines that would otherwise result from covering the entire string with matches. This is 9 bytes, but it doesn't add a trailing linefeed. For n = 3 it's 8 bytes and for n = 2 it's 7 bytes. Note that you can save one byte overall if the empty lines don't matter (e.g. because you'll only be processing non-empty lines and getting rid of linefeeds later anyway): then you can remove the _.

The third option is to use a match. With the ! option we can print all the matches. However, to include the trailing chunk, we need to allow for a variable match length:

M!`.{1,n}

This is also 9 bytes, and also won't include a trailing linefeed. This also becomes 8 bytes for n = 3 by doing ..?.?. However note that it reduces to 6 bytes for n = 2 because now we only need ..?. Also note that the M can be dropped if this is the last stage in your program, saving one byte in any case.

Discarding an incomplete trailing chunk

This gets really long if you try to do it with a replacement, because you need to replace the trailing chunk with nothing (if it exists) and also with a split. So we can safely ignore those. Interestingly, for the match approach it's the opposite: it gets shorter:

M!`.{n}

That's 7 bytes, or less for n = 2, n = 3. Again, note that you can omit the M if this is the last stage in the code.

If you do want a trailing linefeed here, you can get that by append |$ to the regex.

Bonus: overlapping chunks

Remember that M has the & option which returns overlapping matches (which is normally not possible with regex). This allows you to get all overlapping chunks (substrings) of a string of a given length:

M!&`.{n}

Combine loops if possible

In non-trivial computations you'll often find yourself using several loops to process data:

+`stage1
+`stage2
+`stage3

So this runs stage1 until the output converges, then stage2 until the output converges and then stage3 until the output converges.

However, it's always worth examining the stages in detail. Sometimes it's possible to run the loop in an interleaved fashion as stage1, stage2, stage3, stage1, stage2, stage3, ... instead (this depends a lot on what the stages actually do, but sometimes they make completely orthogonal changes or work well as a pipeline). In this case you can save bytes by wrapping them in a single loop:

{`stage1
stage2
}`stage3

If stage1 is the first stage or stage3 is the last stage of the program you can then even omit on of those parentheses as well (which means this can already save bytes for a loop of two stages).

A recent use of this technique can be seen in this answer.