| Bytes | Lang | Time | Link |
|---|---|---|---|
| 429 | JavaScript Node.js | 250828T043552Z | Arnauld |
| 386 | sqlite3 | 250830T185948Z | DPenner1 |
| 433 | Python 3 | 250829T014821Z | Ted |
| 222 | Jelly | 250831T172333Z | Jonathan |
| 269 | Charcoal | 250828T095410Z | Neil |
JavaScript (Node.js), 429 bytes
Outputs the table in Simplified Chinese.
f=(k=j=129,a=(B=Buffer)("gQOYACg+A/Hnsp58QAaDBwp0Fd6nvqQA3G8CqmBJssMw3zRyIkte3UHTqAD60gIeEDDydwlIvdokfodHwmNkAIiLnCKJXNxCSB33d61Nl8e8oPSizhDFCnl3KRYIJ+eEBq+3VyuUAztrF1OvH/YRUnTjsbSRvbY/VbpRt20TfrOvu4qzm7a8rae3JoyrYiwqmjrbRCKTLO4NL9upe36t8/2bGHYGkFyVpIQCq6ywsbqgoaKjh5KTlJW/","base64"))=>k--?f(k)+(n=a[k]*4^a[552+k>>2]>>k%4*2,n<9?n?' '.repeat(n*6|8):`
`:B(["553339999"[i=n&15]*5%31^233,a[i+171],n/16+128,...i<2?[a[j++]]:[]])):""
Encoding
For each element name, the Unicode code point is broken down into:
- a 2-byte prefix
- a 3rd byte in the range \$[128\dots191]\$
- a 4th byte in the same range if the code point starts with
0xF0
There are 15 distinct prefixes.
Hence the following encoding on 10 bits:
vvvvvvpppp
\____/\__/
| |
| '--> prefix identifier
'-------> value of 3rd byte, minus 128
The upper 8 bits and the lower 2 bits of these 10-bit values are packed into two separate byte streams.
Special values lower than 9 are used for line-feeds and repeated whitespace.
If the prefix identifier is either 0 or 1, a 4th byte is taken from a separate lookup table.
Byte array
All data streams are concatenated into a single byte array, which is encoded in Base64:
| Offset | Size | Description | Pointer variable |
|---|---|---|---|
| 0 | 129 | the upper 8 bits of the 10-bit values | k |
| 129 | 9 | the fourth bytes | j |
| 138 | 33 | the lower 2 bits of the 10-bit values | k |
| 171 | 15 | the 2nd bytes of the prefixes | i |
First bytes of the prefixes
The first bytes of the prefixes are not stored in the byte array. They're computed with a dedicated and slightly shorter formula instead:
"553339999"[i]*5%31^233
5, 3 and 9 are converted to 0xF0 (240), 0xE6 (230) and 0xE7 (231) respectively. The default value for \$9\le i\le15\$ is 0xE9 (233).
Encoder
This code generates the Base64 data string, along with the offsets and sizes of each part.
const A = [...
"氢 氦\n" +
"锂铍 硼碳氮氧氟氖\n" +
"钠镁 铝硅磷硫氯氩\n" +
"钾钙 钪钛钒铬锰铁钴镍铜锌镓锗砷硒溴氪\n" +
"铷锶 钇锆铌钼锝钌铑钯银镉铟锡锑碲碘氙\n" +
"铯钡镧铈镨钕钷钐铕钆铽镝钬铒铥镱镥铪钽钨铼锇铱铂金汞铊铅铋钋砹氡\n" +
"钫镭锕钍镤铀镎钚镅锔锫锎锿镄钔锘铹𬬻𬭊𬭳𬭛𬭶鿏𫟼𬬭鿔鿭𫓧镆𫟷鿬鿫"
];
const pfxSet = new Set;
for(const c of A) {
if(c != ' ' && c != '\n') {
pfxSet.add(getPrefixKey(Buffer.from(c)));
}
}
const pfx = [...pfxSet].sort();
const _10bit = [];
const byte4 = [];
for(let i = 0; i < A.length; i++) {
const c = A[i];
if(c == ' ') {
let j = i;
while(A[i + 1] == ' ') {
i++;
}
_10bit.push(({ 14: 1, 24: 4, 30: 5 })[i + 1 - j]);
}
else if(c == '\n') {
_10bit.push(0);
}
else {
const a = Buffer.from(c);
const pfxId = pfx.indexOf(getPrefixKey(a));
_10bit.push(pfxId | a[2] - 128 << 4);
if(pfxId < 2) {
byte4.push(a[3]);
}
}
}
const _2bit = _10bit.flatMap((_, i) =>
i & 3 ? [] : [ 3, 2, 1, 0 ].reduce((p, v) => p << 2 | _10bit[i + v] & 3, 0)
);
const _8bit = _10bit.map((n, i) => n >> 2 ^ _2bit[i >> 2] >> i % 4 * 2 + 2);
const byte2 = pfx.map(s => +s.split("/")[2]);
const data = [ _8bit, byte4, _2bit, byte2 ];
let ptr = 0;
data.forEach((arr, i) => {
console.log(
[ "8-bit", "4th bytes", "2-bit", "2nd bytes" ][i].padEnd(9) + ": " +
"offset = " + ptr.toString().padStart(3) + ", " +
"size = " + arr.length.toString().padStart(3)
);
ptr += arr.length;
});
console.log("\n" + Buffer.from(data.flat()).toString("base64"));
function getPrefixKey(a) {
return [ a.length == 4 ? 0 : 1, a[0], a[1] ].join("/");
}
sqlite3, 386 bytes
Edit: New solution. Turns out it's hard to beat pre-existing compression (though CLI built-in zipfile was beatable). I've left my previous custom solution below.
Simplified Chinese table with double ASCII spaces. Uses the Brotli compression from the sqlite-compressions extension (v0.3.7); place libsqlite_compressions.so in the relevant library path.
.load libsqlite_compressions
SELECT brotli_decode(base85('.fofES<?GrK&%y$?Eu7B5O<+#[1JPi7\P?9Ky7hCrcLVMoC$Zt18`x1DzZb/2_qe;Jr2,L*xLeo9UVzW+-E?]rZ=1De;0iyoMOmW+*k+Y6uk<xFiPw1TEcj/vb/mTSOsx&]tRan#WZ-b&0W7Mu=V=1]h7d?Dxk;eHc>JS;Z2r84B1;c$$<]bu8?,S/<>J]WE_5Ss$IUNQH*5sXztnj*\x[Qnl60n+a.9B`dTOGECzFTnB[=tSGNh[Qe*f53MGX928OmAF:MbZS/&m049;2xA[o&^Qyd]/vqd,:u*^_5B&DV]k1E2^ao27gSzSCRHwCP;>T6a'));
(Edit: I'm not a regular, took me a bit to understand the general rules on using external libraries + byte count, I think I got it now, but let me know if I've done anything wrong)
sqlite3 - custom compression solution - 497 494 482 479 bytes
Traditional Chinese table with ideographic spaces. Uses the regexp extension; place regexp.so in the relevant library path.
.load regexp
PRAGMA encoding=utf16;SELECT unhex(regexp_replace(hex(base85(format('6>ce^%.53c/O^gHU$T=t<a%.41ciqQX4n*On5q*02VEb.?%.43czWwEe716Er,&Yl5:J%.26cEOQrpslhfdgnE35slYO`?h8r1ubnV\xYg,Kt[QG%.22c*K:rm9XvH$QqW#AAwEVq>Vj.r_fzP1Qsj\zVR-,0D$:TbDnw\iV[qI*X,6YwLx,[FOhCH/psbO^f-yi:GqW,xZObTuJkc&BrGyE96jkJG`rP9\L]2-LYxP&s1]zGz.D=Qba#l&>&&wq@w>_RPu0X8_5Jm=s2Llm3%%%%8mi^9PmNC11mghx3UV;tZUF&/fU5G0vU8jm1$p','#','#','#','#','#'))),'88.*A|(\w)(\w\w)(?=.*88.*\1(\w\w)A)','${1:+$2$3}'));
Explanation
I thought a UTF-16(LE) based approach might be more efficient for Chinese characters. The Traditional Chinese table's UTF-16 code units have 16 distinct leading bytes, so I compressed each code unit to 1.5 bytes (3 hex digits), using a lookup table. Crucially, 0x30 is mapped to 0 so that the ideographic spaces are just consecutive zeroes for the base85 encoding + string format.
The last bit of the periodic table where surrogates start contains rare leading bytes and ended up being more efficient to leave as-is. The lookup table is inserted before this and a regular expression does the substitutions. The lookup table starts with 88, which does not otherwise appear in the encoded text and entries are separated by A which does not occur past the lookup table.
Notably, this compressed string is 5 bytes shorter than the base85 Brotli string.
Possible areas for improvement
- Encoding: The sqlite3 CLI documentation led me to believe that on Windows, UTF-16 would be the default console output, but I was not personally able to get that to work, so the
PRAGMAwas necessary. Additionally, the utf16 pragma defaults to native endianness. Big endian might shave one byte as it resulted in one less%that needed to be escaped in the surrogate part. - Based on the pragma, library import and
regexp_replacenot being that short, I started to wonder whetherperlor somesqlite3+perlsolution might have a lower byte count with my method. Feel free to steal this idea, I might not get to it in a bit!
Python 3, 433 bytes
import zlib,base64
exit(zlib.decompress(base64.b85decode('c$}q>TTTK&5Jmr4g-aD8qC8Zpx)prEpa?R9G(sRMI=ku4`C|#z3QUNV+@v0<Q#TdE=;$92!%GEMLpz6Vrfio+s~Fy5n8YxM;ktqh@zp~?=tIit>1ms?JcdOKuLk--`hPCT47o!DEpP#?B5QmHy~nljin~ceLrVNy$1tm)TioVvWCM4Kwvip~0coRMWPxt*=jZ^J;ks$`o<_YG`i5K}8GeGE;irg4w#XG~iGjZHMk!D;ImR#XW7KSaBU99FG|(klqtT7aN3@C7(H7E5!;cs;dqa+&;|$lqKcmO^3*;7G$2nY%yTFzB6U53M`mryHuYECZnpUPe(}wbLpAU96no~N;`99YP{?wXV%7QZg1=<=NJp')).decode())
Simplified Chinese
An obvious answer, but quite efficient on bytes! (The uncompressed table is 687 bytes.)
Note: the output in TIO doesn't handle the spaces and the chinese characters properly, so it looks wonky. I am using the ascii spaces x2 instead of each ideographic space. When I run it in a normal terminal it looks correct.
Jelly, 222 bytes
“V&ḟhẆɠWƒ©Ñ?~ŒufṪ'Æw{_ṅµ®v(Ṅọ8Ẉɦ*ṗUÑȮḄȤẒọ6⁹ṣµʠṗ⁵ȥ~’ṃ“nƬqƲⱮċɲḅẎ{‘>200ḤżƊ;"“ṾṚ^ⱮĖẈṂȥƑ]m'ḳMḞẒ0Ẏ:+"|ĊQDọlƈẉʂı4ZȦN)çƝ\LḌ÷a?⁶ḳoḲƙƘzḌ⁷ݶ1ẋXẏ%G eæ⁽ḣ<bu¬ṫzM8⁼Ƥ¹ROṗZU[ññỌ;¤ʠøṡPṛ*ḍƥẈƒẆḅ$ƭ⁹crɓ³ƒƙḄUɼḂ⁵ẹʂȦẒ‘ḅØ⁵“¤¿Œ%7W‘œṖz⁽-%Zṙ"“¢©©ÑÑ‘ỌY
A niladic Link that yields a list of characters, or a full program that prints to stdout.
How?
“...’ṃ“...‘>200ḤżƊ;"“...‘ḅØ⁵“...‘œṖz⁽-%Zṙ"“...‘ỌY - Link: no arguments
“...’ - 1100111122200011222222222222220031222222222222222001222222222222222224222222225122220124222222222222222666667867792877
ṃ“...‘ - base-decompress using [110,152,113,153,149,232,163,212,209,123] as digits [1,...,9,0]
-> #250s = [110,110,123,123,110,...,212,163,163]
Ɗ - last three links as a monad - f(#250s):
>200 - {#250s} greater than 200?
-> [0,0,0,0,0,...,1,0,0]
Ḥ - double -> #62500s
ż - zip with {#250s}
-> [[0,110],[0,110],[0,123],[0,123],[0,110],...,[2,212],[0,163],[0,163]]
;"“...‘ - zip-wise concatenate with #1s = [186,182,94,149,194,187,179,170,146,93,109,39,217,77,195,189,48,209,58,43,34,124,192,81,68,221,108,156,227,167,25,52,90,190,78,41,23,150,92,76,173,28,97,63,134,217,111,177,161,148,122,173,135,198,127,49,247,88,248,37,71,32,101,22,141,237,60,98,117,7,245,122,77,56,140,151,129,82,79,242,90,85,91,27,27,181,59,3,165,29,244,80,222,42,213,164,187,158,207,212,36,168,137,99,114,155,131,158,161,172,85,166,191,133,214,167,190,189]
ḅØ⁵ - convert {these} from base 250
“...‘œṖ - partition {that} at 1-indices [3,11,19,37,55,87]
z⁽-% - transpose with filler 12288 (Ideographic Space code-point)
Z - transpose
ṙ"“...‘ - zip-wise rotate-left by [1,6,6,16,16] (and two implicit, trailing zeros)
Ọ - cast to characters
Y - join with newline characters
Charcoal, 275 269 bytes
⪪⭆I⪪”}““p⟦|nD9⁼KUU⊟V≕λGN¦Φ⌈∨﹪2_↔U⊖R#5⎚⮌⟧εy∧≧Vü>C\Wγ*⁴vÀ“§⦄℅JX?‖ρ⁷q⦃3∨∕ï'¹ZIl,)✂⁷◧⁵↙h″◨ρ∕GΠmHCI↧⁷D➙C№r2ïa⌕XH↙→﹪b¬κς⌊⌈R[⁵C″⊙▷↥,}~jψ≔=⁵§ς⁶¦¤hl↶⦃∕b⬤W⁹ γτT↶QM<Y⁻QJd@ 9⁴✂'ê≦↔In⊗Πⅉ⟧´A∕]«ez≧¦$⟧⌕ia6σ/↙⌈~Þ…Yr⎚↨ηü¹XZ0?V&ê¡ê\BIHyoⅉ(;H~ς⦃¦À⧴Fς⁴êy↗¿w⟧Sa@)8⬤XêY⧴℅⊕0“dξ⁷∨4◧A▶⍘yV\”⁶℅ι³²
Try it online! Link is to verbose version of code. Outputs in Traditional Chinese. Explanation:
”...” Compressed string of code points
⪪ ⁶ Split into length 6 substrings
I Cast to integer
⭆ Map over code points and join
ι Current code point
℅ Convert to Unicode
⪪ ³² Split into length 32 substrings
Implicitly print
Edit: Saved 6 bytes by removing the spaces from my compressed string. Notes on the previous 275-byte version: Simplified Chinese is 9 bytes longer. I can print null bytes and then set the background to the ideographic space but that doesn't save any bytes. I even tried printing vertically (because that way I don't even have to print the null bytes) but that was actually 10 bytes longer. And I also tried encoding the differences between successive characters but that was 26 bytes longer.