| Bytes | Lang | Time | Link |
|---|---|---|---|
| 188 | Python | 250529T203613Z | Willow W |
| 425 | Python3 | 250511T160158Z | Ajax1234 |
| 182 | JavaScript ES7 | 230220T021908Z | Arnauld |
| 021 | 05AB1E | 230220T171017Z | The Thon |
| 195 | JavaScript Node.js | 230220T155132Z | l4m2 |
| 017 | Vyxal | 230220T131451Z | AndrovT |
| 044 | Charcoal | 230220T100330Z | Neil |
| 051 | K ngn/k | 230220T094540Z | ovs |
Python, 188 bytes
P=lambda A,n:n and[[A[:i]]+B for i in range(1,len(A)-n+1)for B in P(A[i:],n-1)]or[[A]]
f=lambda L,c:min(P(sorted(L),c-1),key=lambda X:sum((y-sum(Y)/len(Y))**2/len(Y)for Y in X for y in Y))
The function P(A,n) recursively computes all ordered partitions of a list A into n+1 clusters, meaning P([0,1,2,3,4],1) will contain [[0],[1,2,3,4]] and [[0,1,2],[3,4]] for example, but not [[0,1,3],[2,4]]. Since we're trying to minimize the variance, unordered partitions can always be improved by an ordered partition (as long as we sort the input list L), so we only need the check those.
f(L,c) is the function that find the best partition. It takes the list L and the number of clusters c as inputs, computes all the ordered partitions with P(sorted(L),c-1), and returns the partition X that minimizes the sum of variances sum((y-sum(Y)/len(Y))**2/len(Y)for Y in X for y in Y). Of course, since all partitions have the same number of clusters, minimizing this sum is the same as minimizing the mean variance.
Python3, 425 bytes
Long, but designed to be efficient on larger values of \$l\$ and \$c\$.
v=lambda x:sum(sum((i-sum(j)/len(j))**2 for i in j)/len(j)for j in x)
def f(s,c):
L,V=0,-1
q=[(dict(enumerate(s)),[],[])]
for a,b,s in q:
if len(s)+1==c and not a:
W=v(s+[b])
if V==-1 or W<V:L=s+[b];V=W
continue
if len(s)+1>c:continue
for i in[*a]:
A={**a}
if[]==b or A[i]>=b[-1]:
O=[(b+[K:=A.pop(i)],[]),([K],[b]*bool(b))]
for o,B in O:
if V==-1 or v(s+B+[o])<=V:q+=[(A,o,s+B)]
return L
JavaScript (ES7), 182 bytes
Expects (list)(c).
a=>m=c=>eval("for(q=c**a.length;p=a.slice(-c).map((_,j)=>a.filter((_,i)=>!(q/c**i%c^j))),q--;)p.some(A=>A.map(v=>s+=(v-eval(A.join`+`)/w)**2/w,w=A.length)=='',s=0)|s>m||(m=s,o=p);o")
Commented
This is a version without eval() for readability.
a => // a[] = input list
m = // initialize m to a non-numeric value
c => { // c = number of clusters
for( // main loop:
q = c ** a.length; // start with q = c ** a.length
p = a.slice(-c) // p[] = partition of length c
.map((_, j) => // for each value in p[] at index j:
a.filter((_, i) => // for each value in a[] at index i:
!( // keep the value if
q / c ** i % c // floor(q / c ** i) mod c
^ j // is equal to j
) //
) // end of filter()
), // end of map()
q--; // stop once q = 0 has been processed
) //
p.some(A => // for each array A[] in p[]:
A.map(v => // for each value v in A[]:
s += ( // add to s:
v - // v minus
eval(A.join`+`) // the sum of all values in A[]
/ w // divided by w (the length of A[])
) ** 2 / w, // squared and divided by w again
w = A.length // initialize w
) // end of map()
== '', // trigger the some() if A[] was empty
s = 0 // start with s = 0
) // end of some(); if the result is falsy
| s > m || // and we don't have s > m:
(m = s, o = p); // update the minimum and the output array
// (implicit end of for)
return o // return the output array
} //
05AB1E, 21 bytes
{.œʒg²Q}ΣεDÅA-nÅA}O}н
Port of AndrovT's Vyxal answer.
Explanation
{.œʒg²Q}ΣεDÅA-nÅA}O}н # Implicit input
{ # Sort the first input
.œʒ } # Filter its partitions by:
g # Length of list
²Q # Equals the second nput
Σ }н # Minimum by:
ε } # Map:
ÅA # Get mean
D - # Subtract
n # Square
ÅA # Get mean
O # Sum
# Implicit output
JavaScript (Node.js), 195 bytes
f=([c,...x],n,L=P=x.slice(-n).map(_=>[]))=>1/c?L.map((_,i)=>f(x,0,L.map(x=>i--?x:[...x,c])))&&P:P=B(P)<B(L)?P:L;A=x=>x.map(t=>X+=++Y&&t,X=Y=0)+x?X/Y:1/0;B=L=>A(L.map(x=>A(x.map(v=>(v-A(x))**2))))
Vyxal, 17 bytes
søṖ'L⁰=;‡ƛṁ-²ṁ;∑∵
s # sort
øṖ # all partitions
' ; # filter by:
L # length
= # is equal to
⁰ # last input
‡ ∵ # minimum by:
ƛ ; # map:
ṁ- # subtract mean
² # square
ṁ # mean
∑ # sum
Charcoal, 45 44 bytes
I⊟⌊EΦEXηLθEηΦθ⁼λ﹪÷ιXηξη⌊ι⟦ΣEι∕ΣEλΣX⁻νλ²XLλ²ι
Try it online! Link is to verbose version of code. Explanation:
EXηLθEηΦθ⁼λ﹪÷ιXηξη
All permutations of the input list elements into c clusters...
Φ...⌊ι
... where no cluster is empty...
E...⟦ΣEι∕ΣEλΣX⁻νλ²XLλ²ι
... calculate the doubled variance of each cluster...
I⊟⌊...
... output the clusters with the minimum.
Edit: Saved 1 byte by calculating the doubled variance using the last alternative variance formula for finite populations:
$$ \textrm{Variance(X)} = \frac{1}{2N^2} \cdot \sum_{i=1}^{N} \sum_{j=1}^{N} (x_i - x_j)^2 $$