| Bytes | Lang | Time | Link |
|---|---|---|---|
| 103 | Haskell | 220529T082929Z | Wheat Wi |
| 049 | Raku | 230718T001932Z | bb94 |
| 045 | Vyxal | 220601T022435Z | naffetS |
| 034 | 05AB1E | 220530T132807Z | Kevin Cr |
| 038 | Retina 0.8.2 | 220528T145745Z | Neil |
| 059 | Java JDK | 220530T132357Z | Olivier |
| 039 | Charcoal | 220530T125622Z | Neil |
| 113 | Python 2 | 220530T033540Z | math jun |
| 078 | Haskell + hgl | 220529T235850Z | Wheat Wi |
| 193 | C# Visual C# Interactive Compiler | 220529T024420Z | Gymhgy |
| 650 | Python3 | 220528T221226Z | Ajax1234 |
Haskell, 154 108 103 bytes
5 bytes saved thangks to Unrelated String
v=(`elem`"iu@oEOa")
g"S"="tS"
g"Z"="dZ"
g x=x
m(x:y:z)|v x>v y=g[y]++[x]|1>0=m$y:z
m x=x
m.tail.reverse
Explanation
This is overall pretty simple. The first thing we do is reverse the input. We do this because the pitch accent is easier to describe relative to the end of a word than the beginning. From there we use tail to remove the final character. If this is a primary mora we've broken it up and the solution is now the final primary mora. If it wasn't well now we removed the final mora, and the solution is the new final primary mora.
So we are looking for the final primary mora. That's a vowel followed by a non-vowel. So m does just that, it trims until it finds a vowel followed by a consonatn and returns them reversed. We use a small bit of extra logic to handle tS and dZ: If the non-vowel was S or Z, we replace it with tS or dZ respectively otherwise we leave it alone.
And that's it, it gives us the accent peak.
Raku, 49 bytes
Port of Olivier Grégoire’s Java answer.
{S:g/.*((^|<-[iu@oEOaSZ]>S?Z?)<[iu@oEOa]>).+/$0/}
05AB1E, 34 bytes
¨.γ"tSdZ"så_N*}ü2¬нšʒžM'@«slåJ}θJÔ
Inspired by @WheatWizard♦'s Haskell answer.
Try it online or verify all test cases.
Explanation:
¨ # Remove the final character of the (implicit) input-string
.γ # Adjacent group by:
s # Check if the current character
å_ # is NOT in
"dtSZ" # string "dtSZ"
N* # Multiply it by the index, so everything else is in its own group
}ü2 # After the group-by: get all overlapping pairs of this list
¬нš # For the edge case of a leading vowel:
¬ # Get the first pair (without popping the list)
н # Pop and get its first character
š # Prepend it to the list
ʒ # Filter this list of pairs (plus the single leading item) by:
žM # Push the vowels "aeiou"
'@« '# Append a "@"
s # Swap so the current pair is at the top of the stack
l # Convert all letters to lowercase
å # Check if its in the vowels string
J # Join these checks together
# (truthy: "1"/"01"; falsey: "0"/"00"/"10"/"11")
}θ # After the filter: pop and keep the last valid item
J # Join this pair back together to a string
Ô # For the edge case of "tt" or "dd":
Ô # Connected uniquify
# (after which the result is output implicitly)
Retina 0.8.2, 102 77 39 38 bytes
.*((^|[^iu@oEOaSZ]S?Z?)[iu@oEOa]).+
$1
Try it online! Link includes test cases. Explanation: Now heavily inspired by @WheatWizard's Haskell answer, simply matches the last nonterminal vowel with its immediately preceding consonant or at the beginning of the string. Edit: Saved 1 byte thanks to @OlivierGrégoire.
Java (JDK), 59 bytes
s->s.replaceAll(".*((^|[^iu@oEOaSZ]S?Z?)[iu@oEOa]).+","$1")
Credits
- -96 bytes by porting the Neil's Retina answer, thanks to Kevin Cruijssen.
Charcoal, 39 bytes
≔…θ⌈⌕A⁺00⭆…θ⊖Lθ№iu@oEOaι01η✂η±⁺²№SZ§η±²
Try it online! Link is to verbose version of code. Alternative formulation, also 39 bytes:
≔✂θ⁰±⌕⪫00⭆Φ⮌θκ№iu@oEOaι10¹η✂η±⁺²№SZ§⮌η¹
Try it online! Link is to verbose version of code. Explanation: Ports of @WheatWizard's Haskell answer.
≔…θ⌈⌕A⁺00⭆…θ⊖Lθ№iu@oEOaι01η
Except for the last letter, map each letter to whether it is a vowel or not, prefix 00 to the result, find the last index of 01, and truncate the input string at that point, which thanks to the 00 prefix will be just after the desired primary mora, although the second 0 does double duty by allowing an initial vowel to be detected as a primary mora.
≔✂θ⁰±⌕⪫00⭆Φ⮌θκ№iu@oEOaι10¹η
Reverse the string except for the last letter, map each letter to whether it is a vowel or not, wrap the result in 00, find the index of 10, and remove that many characters from the end of the input string. The leading 0 adjusts the find index to be the number of characters to remove while the trailing 0 allows an initial vowel to match as a primary mora.
✂η±⁺²№SZ§η±²
✂η±⁺²№SZ§⮌η¹
Output the last two characters of the remaining string, unless the second last character is S or Z, in which case output the last three.
Python 2, 113 Bytes
lambda s,v='iu@oEOa':(['dt'[i<'T'][:i in'SZ']+i+j for i,j in zip(s[:-1],s[1:-1])if(i in v)<(j in v)][-1:]or s)[0]
Heavily inspired by Wheat Wizard's Haskell answer.
Approach
- Split the word into two-letter chunks, excluding the last letter
- Find all chunks of the form CV (consonant then vowel)
- If no such chunk exists, return the first letter of the word
- Return the last such chunk
Haskell + hgl, 78 bytes
v=kB$fe"iu@oEOa"
rv<gj<gP(h_*>l2p(p<v)(nA v*>hds++pM""en++(ʃ>/wR"St Zd")))<rv
Explanation
This works a lot like my haskell answer.
First we setup v which is a parser which accepts any vowel. With that the first step in our function is to reverse the input with rv. We do this since we want to do things relative to the end of the list rather than the front of the list.
From here we have the actual parser.
h_*>l2p(p<v)(nA v*>hds++pM""en++(ʃ>/wR"St Zd"))
The actual parser just finds all primary morae except ones that end the list First it does h_ which parses one or more characters then we parse a primary mora.
Since h_ never parses 0 characters this breaks up any primary mora that would be at the end of the list.
The part that parses a primary mora is a little bit more complex
l2p(p<v)(nA v*>hds++pM""en++(ʃ>/wR"St Zd"))
But it can be broken down into further parts, first it parses a vowel p<v which parses a single vowel as a string. Then it parses a non-vowel, it does this with a negative look-ahead to check it's not a vowel nA v. Then it parses either, any one character hds, the end of the string pM""en or one of the digraphs ʃ>/wR"St Zd". It can do both the first and last option at the same time, but since the digraphs are placed last they will have higher priority in the final result.
Once we have that parser we can run it. We run it with gP to get all results as a list. We use gj to get the last result, which do to the way we've done things will be the mora closest to the end of the string. This is also where the digraphs get higher priority than single characters, they will always appear later in the list.
Finally we reverse the given mora back to the correct order.
Relfection
There were some pretty painful things here. When I first started I immediately realized there were some very important functions missing. I added them to the repo, although they are not present above.
I won't go into much detail but they would have saved 3 bytes overall:
v=xay"iu@oEOa"
rv<gj<gP(h_*>l2p(p<v)(nA v*>hds++pM""en++asy["St","Zd"]))<rv
However there are still some things that could be improved:
pM""enis so painful. It's just an extremely expensive way to say the end of the list.enshould in theory do that, but it returns a()instead and we need it to be an empty list. A version ofenthat returns an empty list would be useful.- We really shouldn't have to do
gj<<gP, there should be a built in way to get the highest / lowest priority parse, and only that.
C# (Visual C# Interactive Compiler), 193 bytes
s=>System.Text.RegularExpressions.Regex.Match(s,$"^((?<P>{C}?{V}){V}?{C}?)((?<P>{C}{V}){V}?{C}?)*?({C}{V})?$").Groups["P"].Captures[^1].Value;string T="iu@oEOa",C=$"([^{T}][SZ]?)",V=$"([{T}])";
^((?<P>{C}?{V}){V}?{C}?)((?<P>{C}{V}){V}?{C}?)*?({C}{V})?$
This regex does all of the work. C and V stand for consonants and vowels respectively.
((?<P>{C}?{V}){V}?{C}?)
The first part matches the first syllable, and captures its primary mora in capturing group P.
((?<P>{C}{V}){V}?{C}?)*?
This part matches the next syllables, again capturing their primary mora in capturing group P. Note the lazy quantifier at the end.
({C}{V})?
This matches a single-mora syllable at the very end, if any.
We can then take the final mora captured within P, since it is guranteed to be primary, and also guranteed to not be the final mora (since that would either be captured within the last part, or it would be non-primary).
Python3, 650 bytes:
lambda x:[*s(T(x),1)][0]
T=lambda x,c=[]:c if''==x else T(x[len((t:=[(a,b)for a,b in[('C','tS'),('C','dZ')]+[('C',i)for i in'mnJNptkbdgsx4ljw']+[('V',i)for i in'iu@oEOa']if x[:len(b)]==b][0])[1]):],c+[t])
P=lambda x:''.join(i[1]for i in x)
def M(w):
w=[i for j in w for i in j]
for i in[-2,-3,-4]:
if w[i][0]:return w[i][1]
def s(t,l,c=[]):
if[]==t:yield M([[(1,P(i[:I[0]+1])),*[(0,P([j]))for j in i[I[0]+1:]]]if(I:=[x for x,(a,_)in enumerate(i)if'V'==a])else i for i in c]);return
for i in'CVVC CVV CVC CV VVC VV VC V'.split():
if len(t)>=len(i)and(i[0]!='V'or l)and all(a==k for(a,b),k in zip(t,i)):yield from s(t[len(i):],0,c+[t[:len(i)]])