g | x | w | all

Bytes	Lang	Time	Link
251	Setanta	240719T203134Z	bb94
195	Lexurgy	220119T070008Z	bigyihsu
438	C gcc	220118T080206Z	Alexandr
086	Python 3	220114T153050Z	Jakque
051	Jelly	220114T132149Z	Jonathan
055	Charcoal	220114T123027Z	Neil
064	Perl 5 p	220114T211748Z	Xcali
313	TypeScript type system	220114T192202Z	Merlin04
047	Pip	220114T152555Z	DLosc
049	05AB1E	220114T151652Z	Kevin Cr
047	Retina 0.8.2	220114T121400Z	Neil

Setanta, 260 251 bytes

Sure, this could be done a lot shorter in Raku, but what’s the fun in that?

gniomh(f){s:=0m:=""le i idir(0,fad@f){l:=f[i]c:=aimsigh@(go_liosta@"nmptkswlj"())(l)+1v:=aimsigh@(go_liosta@"aeiou"())(l)+1b:=0ma c{b=s==1s=(s&c<2&3)|1}no ma v{b=s==2s=2}no b=1ma b|aimsigh@["wu","wo","ji","ti","nm","nn"](m+l)+1{s=0bris}m=l}toradh s>1}

−9 bytes because whoops Setanta isn’t Raku

Try it on try-setanta.ie

Lexurgy, 195 bytes

Lexurgy is a tool made for conlangers for applying sound changes, so this is perfect for this challenge! ~~(and here I am bashing it into code golf)~~

Outputs the original word if it's valid Toki Pona, and an empty string otherwise.

Extremely slow version:

Class c {m,n,p,t,k,s,w,l,j}
Class v {a,e,i,o,u}
a:
{({j,t} i),(w {o,u}),({m,n} {m,n}),!@c&!@v}=>`
{(!n&@c @c),(@v @v)}=>` *
!@v&!n=>`/_ $
n=>`/$ _ $
c propagate:
[]=>`/{` _,_ `}
d:
`=>*

Much faster version, 199 bytes:

Class c {m,n,p,t,k,s,w,l,j}
Class v {a,e,i,o,u}
a:
{j,t} i=>`
w {o,u}=>`
{m,n} {m,n}=>`
!n&@c @c=>` *
@v @v=>` *
!@v&!n=>`/_ $
n=>`/$ _ $
!@c&!@v=>`
c propagate:
[]=>`/{` _,_ `}
d:
`=>*

Ungolfed:

Class cons {m,n,p,t,k,s,w,l,j}
Class vow {a,e,i,o,u}

remove-forbidden:
 {j,t} i => ` # ji, ti
 w {o,u} => ` # wo, wu
 {m,n} {m,n} => ` # mn, mm, etc
 !n&@cons @cons => ` * # no consecutive consonants
 @vow @vow => ` * # no consecutive vowels
 !@vow&!n => ` / _ $ # ending with a vowel or n
 n => ` / $ _ $ # nothing of length 1
Then:
 !@cons&!@vow => ` # convert any invalid character
Then propagate:
 [] => ` / {` _, _ `} # spread the invalid
Then:
 ` => * # delete the invalid

C (gcc), 438 bytes

#define R return
int c(l){char a[]={'n','m','p','t','k','s','w','l','j'};for(int i=0;i<9;i++)if(l==a[i])R 1;R 0;}
int v(l){R l==97||l==101||l==105||l==111||l==117?1:0;}
int f(char* s){int i,a,b;for(i=0;*s!=0;s++,i++){a =*s;b=*(s+1);if(!(c(a)||v(a))||((a=='j'||a=='t')&&b=='i'||a=='w'&&(b=='u'||b=='o')||a=='n'&&(b=='n'||b=='m'))||(c(a)&&c(b)&&a!='n')||(v(a)&&v(b))) R 0;}if(i==1&&c(*(s-1))) R 0;if(*s==0&&v(*(s-2))&&*(s-1)!='n') R 0;R 1;}

Try it online!

Explanations :

#define R return
// function to detect a consonant
int c(l){char a[]={'n','m','p','t','k','s','w','l','j'};for(int i=0;i<9;i++)if(l==a[i])R 1;R 0;}
// function to detect a vowel
int v(l){R l==97||l==101||l==105||l==111||l==117?1:0;}

int f(char* s){int i,a,b;for(i=0;*s!=0;s++,i++)
{
    a =*s;b=*(s+1);
    if(!(c(a)||v(a))||      // detect if characters are allowed
    ((a=='j'||a=='t')&&b=='i'||a=='w'&&(b=='u'||b=='o')||a=='n'&&(b=='n'||b=='m'))|| // detect if sequences ji, wu, wo & ti are not used
    (c(a)&&c(b)&&a!='n')||  // detect if there are not 2 consecutives consonants
    (v(a)&&v(b)))           // detect if there are not 2 consecutives vowels
    R 0;
    if(i==1&&c(*(s-1))) R 0;    // detect if it a single letter word & a vowel
    if(*s==0&&v(*(s-2))&&*(s-1)!='n') R 0;  // test if the last character is not a consonant except 'n'
    R 1;
}
```

Python 3, 97 88 86 bytes

lambda x:re.sub("((?!ji|wu|wo|ti|.*n[nm])(^|[j-npstw])[aeiou]n?)*$","",x)>""
import re

Try it online!

return False for valid word, True for invalid

Thanks to @14m2 for -2 bytes

How it works:

at each syllable, we chek for ji|wu|wo|ti and prevent any capture if it is present. We also chek for the presence of either nn or nm further in the word.
if it was absent, we capture the syllable (consonant + voyel (+ n))
All the syllables captured are replaced by the empty string
We then check if the result is greater than the empty string (falsey) or equal to the empty string (thruthy)

Jelly, 56 51 bytes

+1 to cater for strict IO (two distinct outputs rather than truthy vs falsey being allowed)

“jtklmnpsw”,ØẹŒpṖṖ¬3,8¦p”n;ƊṗⱮLẎF€⁾mnyw⁾nnƲÐḟḊ€;$e@

A (very inefficiant) monadic Link that yields 0 when the input string is not a Toki Pona word and 1 when it is.

(Don't) Try it online! (it's so inefficient it'll only complete for words of length three or less!)

...but here is a test-suite that has all tests except the four syllable pankulato that (a) limits to three base-syllables, rather than that of the number of characters in the input string and (b) only calls the word-generating code once for all (hence the e@ has been moved out to the footer).

How?

We construct a list containing ALL valid Toki Pona words constructed from at most length(input) syllables and check if the input is in there.

Yep that's soooo nasty, but without easy regex access I imagine it's the golfiest way.

“jtklmnpsw”,ØẹŒpṖṖ¬3,8¦p”n;Ɗṗ - (partial) Link: integer (from below!)
“jtklmnpsw”                   - "jtklmnpsw"
            Øẹ                - "aeiou"
           ,                  - pair
              Œp              - Catesian product
                ṖṖ            - pop off "wu" and "wo"
                   3,8¦       - apply to indices 3 & 8 ("ji" & "ti"):
                  ¬           -   logical NOT (replace these with [0,0] (integers)
                           Ɗ  - last three links as a monad:
                        ”n    -   'n'
                       p      -   Cartesian product (appends 'n' to each)
                          ;   -   concatenate
                            ṗ - Catiasian power (the integer)

...ⱮLẎF€⁾mnyw⁾nnƲÐḟḊ€;$e@ - (continued) Link: string, S
... L                     - length of S
...Ɱ                      - map across [1..length(S)] with:
...                       -   code above -> base-syllable combos of each length
     Ẏ                    - tighten
      F€                  - flatten each
                 Ðḟ       - filter discard those for which:
                Ʋ         -   last four links as a monad:
        ⁾mn               -     "mn"
           y              -     translate (convert ms to ns)
             ⁾nn          -     "nn"
            w             -     index of first occurrence (or zero)
                      $   - last two links as a monad:
                   Ḋ€     -   dequeue each
                     ;    -   concatenate
                        @ - with swapped arguments:
                       e  -   S exists in there?

Charcoal, 59 58 55 bytes

∧θ¬⊙⪪”&↧q1o⁺VＰα”²№θι≔aeiouηＦ⮌θ¿№ηι≔⁻”&↧ï⁸t∕p№t⟦”ηη¿⁻ιn⎚

Try it online! Link is to verbose version of code. Explanation:

∧θ¬⊙⪪”&↧q1o⁺VＰα”²№θι

Check that the word doesn't contain any of the illegal letter pairs contained in the compressed string.

≔aeiouη

Start by expecting the last character to be a vowel.

Ｆ⮌θ

Loop over the word in reverse.

¿№ηι

If we see an expected letter, ...

≔⁻”&↧ï⁸t∕p№t⟦”ηη

... then flip the set of expected letters by subtracting it from the string all the legal Toki Pona letters grouped into vowels and consonants.

¿⁻ιn

Otherwise, if the current letter is not an n, ...

⎚

... then erase any previous validity there might have been.

Perl 5 `-p`, 64 bytes

$_=!/[jt]i|wu|wo|nm|nn/&&/^([aeiou]n?)?([mnptkswlj][aeiou]n?)*$/

Try it online!

TypeScript type system, 313 bytes

type v="a"|"e"|"i"|"o"|"u";type i<T>=T extends""?1:T extends`${Exclude<`${"m"|"n"|"p"|"t"|"k"|"s"|"w"|"l"|"j"}${v}`,"ji"|"wu"|"wo"|"ti">}${infer r}`?i<r>extends 1?1:r extends`n${infer e}`?e extends`${"n"|"m"}${any}`?0:i<e>:0:0;type o<T>=T extends`${v}${infer p}`?i<p>extends 1?1:p extends`n${infer r}`?i<r>:0:i<T>

This is written entirely with TypeScript types - the o type outputs 1 if the input parameter is a valid word and 0 if it is not. There's probably some room for further golfing.

Pip, 56 53 47 bytes

-3 bytes by porting Neil's Retina answer

X<>"jiwuwotinnnm"NIa&a~=+:`^|[j-nptsw]`+XV.`n?`

Returns 1 for a valid word, 0 for an invalid word. Attempt This Online!

Verify all test cases

Explanation

At its core, this solution works similarly to Neil's Retina answer:

The input does not contain any of the illegal sequences ji, wu, wo, ti, nn, or nm; AND
The input fully matches the regex ((^|[j-nptsw])[aeiou]n?)+

First half:

X<>"jiwuwotinnnm"NIa
   "jiwuwotinnnm"     That string
 <>                   Grouped into pairs of characters
X                     Converted to a regex that matches any of those pairs
                 NI   Does not match in
                   a  The command-line argument

Second half:

a~=+:`^|[j-nptsw]`+XV.`n?`
     `^|[j-nptsw]`          That regex
                  +         Wrapped in a non-capturing group and followed by
                   XV       Built-in regex `[aeiou]`
                     .      Followed by
                      `n?`  That regex
   +:                       Apply the + quantifier to the above wrapped in n.c. group
a~=                         Command-line argument fully matches that regex

05AB1E, 49 bytes

„nn„nm‚åà≠×ε.•2Ñ|qγù•žMâ¨¨D27SèKD'n««N>ãJ}˜D€¦«Iå

Port of @JonathanAllan's Jelly answer, but even slower.. :/
Outputs 1/0 for accept/reject respectively.

Try it online.
As is it's too slow for a test suite, but by adding 2äн between the × and ε (map over halve the input-length instead), we can verify all but the longest few truthy test cases and falsey test cases respectively, in separated test suites.

Explanation:

„nn„nm‚               # Push pair ["nn","nm"]
       åà≠            # Check that NEITHER is present in the (implicit) input
          ×           # 'Multiply' it by the (implicit) input-string
                      # (the input if truthy; "" if falsey)
ε                     # Map over the characters:
 .•2Ñ|qγù•            #  Push compressed string "jtklmnpsw"
          žM          #  Push builtin vowels "aeiou"
            â         #  Pop both, and create a list of all possible char-pairs
             ¨¨       #  Remove the last two ("wu" and "wo")
               D      #  Duplicate the list
                27S   #  Push pair [2,7]
                   è  #  Index those into the copy: ["ji","ti"]
                    K #  Remove those as well
 D                    #  Duplicate the list again
  'n«                '#  Append an "n" to each string
     «                #  Merge the two lists together
 N                    #  Push the 0-based map-index
  >                   #  Increase it by 1 to make it 1-based
   ã                  #  Cartesian product this index on the list of syllables
    J                 #  Join each inner list together to a string
}˜                    # After the map: flatten the list of lists
  D                   # Duplicate the list
   €¦                 # Remove the first consonant from each
     «                # Merge the two lists together
Iå                    # Check if the input-string is in this list
                      # (after which the result is output implicitly)

See this 05AB1E tip of mine (section How to compress strings not part of the dictionary?) to understand why .•2Ñ|qγù• is "jtklmnpsw".

Retina 0.8.2, 48 47 bytes

A`ji|nm|nn|ti|wu|wo
^((^|[j-npstw])[aeiou]n?)+$

Try it online! Link includes test cases. Edit: Saved 1 obvious byte thanks to @ovs. Explanation:

A`ji|nm|nn|ti|wu|wo

Delete invalid inputs.

^((^|[j-npstw])[aeiou]n?)+$

Match valid inputs that weren't invalidated above.

Setanta, 260 251 bytes

Lexurgy, 195 bytes

C (gcc), 438 bytes

Python 3, 97 88 86 bytes

How it works:

Jelly, 56 51 bytes

How?

Charcoal, 59 58 55 bytes

Perl 5 -p, 64 bytes

TypeScript type system, 313 bytes

Pip, 56 53 47 bytes

Explanation

05AB1E, 49 bytes

Retina 0.8.2, 48 47 bytes

Perl 5 `-p`, 64 bytes