| Bytes | Lang | Time | Link |
|---|---|---|---|
| 026 | Haskell + hgl | 240724T135452Z | Wheat Wi |
| 016 | sed 4.2.2 | 240725T052036Z | Digital |
| 043 | JavaScript ES6 | 240724T081140Z | Arnauld |
| 082 | Python 3.8 prerelease | 240724T095658Z | Jitse |
| 030 | Charcoal | 240725T083936Z | Neil |
| 105 | Setanta | 240725T071331Z | bb94 |
| 029 | Wolfram Language Mathematica | 240724T225458Z | att |
| 008 | Nekomata | 240725T021308Z | alephalp |
| 009 | QuadR ≡ | 240724T085533Z | Adá |
| 014 | Brachylog | 240724T223135Z | Unrelate |
| 018 | Perl 5 + pl | 240724T205349Z | Dom Hast |
| 015 | Brachylog | 240724T203657Z | DLosc |
| 011 | Retina 0.8.2 | 240724T173832Z | Neil |
| 048 | R | 240724T103921Z | pajonk |
| 010 | Japt | 240724T093257Z | Shaggy |
| 008 | 05AB1E legacy | 240724T083148Z | Kevin Cr |
Haskell + hgl, 26 bytes
yyc$sk$h_>~l2 aT(mY<xys)ʃ
This uses the adapted rule from Nitrodon, which has not been proven correct, but at least has not been proven incorrect.
Explanation
yyc: Repeat a function until reaching a fixed point.sk: Replace all matches of a parser with their results.h_: Parse some string.>~: Feed the result of that parse into a function to create a second parse.mY<xys: Parse some number of characters from the inputaT: Ignore the result of that and give the result of ...ʃ: Parse the input string and return it.
This finds occurrences of xyx where y is made up of a subset of the characters of x, and replaces them with x until a fixed point is reached. y can potentially be empty here.
Reflection
This is ok. I'm not too happy with it but I initially expected much worse. This really has more of a glue problem than anything else.
- I could add
l2 aTandl2 aK, they will probably be used again sometime. - I have
(?*>)and(<*?)foraT<pyandaK^.pyand(+*>)and(<*+)foraT<soandaK^.so. I could use(**>) = aT<myand(<**) = aK^.py.
The above are minor fixes, but I think the thing that would really fix the glue problem is having a version of (>~) which just returns the result from the left-hand side instead of the right hand-side. So:
(<~) =
f' $ fb < ap pM
In which case this could look like:
yyc$sk$h_<~(mY<xys)>~ʃ
Although this wouldn't make this answer shorter than implementing either of the above, this does seem like the solution lightest on the glue and an operation which has a large reuse potential
Old versions
Here are some old versions which use the incorrect algorithm given in the question and currently used by all the other answers:
Parser, 21 13 bytes
yyc$sk$h_>~ʃ
Explanation
yyc: Repeat until reaching a fixed point.sk: Apply a parser as many times as possible to substrings.h_>~ʃ: Parse some string twice. This returns the parsed string only once.
No parser, 21 bytes
yyc$mBl<(he~<<gr)<<pt
Explanation
yyc: Repeat until reaching a fixed point.pt: Get all partitions.gr: Group each partition into groups of contiguous equal elements.he: Get the first element of each group.mBl: Get the smallest result.
Reflection
Although I only have two versions of this answer I am splitting the reflection into three based on three potential solutions to this challenge.
Parser version
I am happy with the parser version. I was annoyed by the operator precedence between (~<) and (#|)/(++) in the original version of the answer, but I'm not sure it's actually wrong in general, and now it's no longer a problem.
In the interest in coming up with something to improve here:
- Maybe
yyc<skcould have a builtin? I think this idiom of "apply a parser repeatedly to substrings until reaching a fixed point" could come up again. I just really kind of don't like making more parser consumers. - Weird as it is, I think
fb ʃcould have a builtin. This isn't the first time I've used it, you can also use it for things like palindrome checking. Although I don't think it's useful enough to justify a 2 byte name and with a 3 byte name it wouldn't even save anything here.
Regex version
I don't think I could reasonably do this with regex right now, since I haven't implemented back-referencing yet. This is just yet another push to do that.
Non-parser version
I am rather frustrated with trying to build a non-parser version here. There is a lot to improve on this front.
- A contiguous nub function would be useful.
- I would like an "unpairs" function that basically undoes what
pAdoes. spsfinds all the splits into two pieces. But here I'd like all partitions into 3 pieces. It'd be good to have builtin which takes a number, \$n\$, and finds all the partitions into \$n\$ pieces, with presets for small numbers like 3.pST eL3almost does what I want, but it returns a list which is a bit annoying. It's also inefficient and 7 bytes long.
Python 3.8 (pre-release), 82 bytes
lambda s:(n:=len(s))==[s:=s.replace((x:=s[i%n:i%~n])+x,x)for i in range(n**3)]or s
-14 bytes thanks to xnor
Charcoal, 30 bytes
W⌈ΦEθ⌈ΦEλ✂θμλ¹№θײμκ≔⪫⪪θײιιθθ
Try it online! Link is to verbose version of code. Explanation:
W⌈ΦEθ⌈ΦEλ✂θμλ¹№θײμκ
While there are substrings of the input that appear doubled in the input, take the maximum, and...
≔⪫⪪θײιιθ
... deduplicate that substring from the input.
θ
Output the finally reduced string.
Setanta, 105 bytes
gniomh(s){le t idir(0,fad@s)le j idir(1,fad@s+1)le i idir(0,j){t=cuid@s(i,j)s=athchuir@s(t+t,t)}toradh s}
Wolfram Language (Mathematica), 31 29 bytes
f[a___,b__,b__,c___]=f[a,b,c]
Input [characters...]. Return the characters of the reduced string, wrapped in f.
NonCommutativeMultiply (**) is a conveniently Flat undefined operator. Unfortunately, it needs to be Unprotected before we can add a definition. Try it online!
Nekomata, 8 bytes
ʷ{JYᵗƶ¿j
ʷ{JYᵗƶ¿j
ʷ{ Loop until failure:
J Nondeterministically split the input into parts
e.g. "QUAQUAUAL" may become ["QUA", "QUA", "UA", "L"]
Y Run length encode
e.g. ["QUA", "QUA", "UA", "L"] -> ["QUA", "UA", "L"], [2, 1, 1]
ᵗƶ Check that at least one count is greater than 1
e.g. [2, 1, 1] passes
¿ If the check passes, drop the counts; otherwise fail
e.g. ["QUA", "UA", "L"], [2, 1, 1] -> ["QUA", "UA", "L"]
j Join
e.g. ["QUA", "UA", "L"] -> "QUAUAL"
This may output the same result multiple times. You can add the -1 flag to only output the first result.
QuadR ≡, 9 bytes
Blatant rip-off of l4m2's golf of Arnauld's JavaScript solution — go upvote!
(.+)\1
\1
Brachylog, 15 14 bytes
{c~c₂ᶠo∋~jᵗc}ˡ
I wouldn't be surprised if there's a better way to do this.
...But, I mostly just took Jitse's comment about requiring recursion as a challenge.
{ }ˡ Left reduce by:
c Concatenate the new element to the accumulator,
~c₂ then partition that result into two possibly-empty slices,
ᶠo∋ trying the partitions in lexicographic order.
~jᵗ Un-double the last slice,
c and concatenate the slices back together.
Brachylog, 16 15 bytes
~c₃↺{h&~j}ʰ↻c↰|
Explanation
I wouldn't be surprised if there's a better way to do this.
~c₃↺{h&~j}ʰ↻c↰|
~c₃ "Unconcatenate" the input string into three substrings
↺ Rotate the first element to the end
{ }ʰ Apply this predicate to the new first element:
h& Assert that it is nonempty
~j Assert that it consists of a substring repeated twice,
and return that substring
↻ Rotate the final element back to the beginning
c Concatenate the list back together
↰ Call the main predicate again on the result
| If there was no way to partition the input string that
satisfied the assertions, return the input unchanged
Retina 0.8.2, 11 bytes
+`(.+)\1
$1
Try it online! Link includes test cases. Explanation: Blatant rip-off of @Adám's rip-off of @l4m2's golf of @Arnauld's answer.
Japt, 10 bytes
I didn't understand the spec at all so this is based on the worked example & test cases.
e"(.+)%1"Ï
Try it (includes all test cases)
05AB1E (legacy), 8 bytes
Δ.œ€ÔJéн
Try it online or verify all test cases.
Explanation:
Δ # Loop until the result no longer changes:
.œ # Get all partitions of the current string
# (which will use the implicit input-string in the first iteration)
€ # Map over each partition:
Ô # Connected uniquify all parts in this partition
J # Join each connected uniquified partition back together
é # Sort these strings by length (shortest to longest)
н # Pop and keep the first/shortest one
# (after the loop, the result is output implicitly)
Uses the legacy version of 05AB1E instead of the latest version, since the connected uniquify builtin Ô will interpret stringified numbers as numbers instead of strings. E.g. with input 11.0+1+1e1, the legacy program correctly connected uniqifies ["1","1",".0","+1","+1","e1"] to ["1",".0","+1","e1"] → "1.0+1e1", whereas the new version of 05AB1E will connected uniquify ["1","1.0","+1","+1e1"] to ["1"] → "1", since it interprets them all as the same number (1): try it online.