| Bytes | Lang | Time | Link |
|---|---|---|---|
| 153 | C clang | 250418T083053Z | ceilingc |
| 128 | Charcoal | 250419T222824Z | Neil |
| 165 | JavaScript Node.js | 250418T094350Z | l4m2 |
C (clang), 241 233 220 194 181 170 164 153 bytes
C,x;f(*s){int v=*s,m=64;C=v>127;do if(m<<=5,C&&*s-194>50u|(x=s[C]^128)>63|(v=v<<6|x)>>5==448|v>>5==475|v-15632<256u)return-1;while(C++,v&m);return~-m&v;}
Golfed version of the reference implementation.
Thanks to @l4m2 for -14 and a bugfix!
Closed course, professional driver.
This makes extensive use of clang specific quirks. Kids, as always for code golf, please don't try any of these tricks in production without adult supervision.
Charcoal, 128 bytes
Nθ≡θ²²⁴≔³²ζ²⁴⁰≔¹⁶ζ≔⁰ζ≡θ²³⁷≔³²ε²⁴⁴≔¹⁶ε≔⁶⁴ε¿÷⁻²⁴⁴θ⁵¹⊞υ∨±÷θ¹²⁸θ«⊞υ&θ÷³¹⊕‹²²³θW&θε«≦⊗θ≔⁻N¹²⁸η¿№…ζεη⊞υη«UMυ⁰⊞υ⊖⊟υ≔⁰θ»≔⁰ζ≔⁶⁴ε»»I⟦↨υεLυ
Try it online! Link is to verbose version of code. Prefers to input and output in decimal on separate lines but will accept hexadecimal input if formatted as shown in the link. Explanation:
Nθ≡θ²²⁴≔³²ζ²⁴⁰≔¹⁶ζ≔⁰ζ≡θ²³⁷≔³²ε²⁴⁴≔¹⁶ε≔⁶⁴ε
Input the first byte and assuming that the first byte is in the range 194..224 calculate the valid ranges for the second byte offset by 128.
¿÷⁻²⁴⁴θ⁵¹⊞υ∨±÷θ¹²⁸θ«
If the first byte is not in the range 194..224 then push it or -1 if it is 128 or more to the base 64 result list.
⊞υ&θ÷³¹⊕‹²²³θ
Otherwise, start by pushing the lower 3 to 5 bits as appropriate of the first byte to the base 64 result list.
W&θε«
Repeat while there are still bytes to process. (Although the variable is not always 64 at this point, conveniently the first loop will always execute anyway.)
≦⊗θ
Decrement the number of remaining bytes to process.
≔⁻N¹²⁸η
Get the next byte, offset by 128.
¿№…ζεη⊞υη
If it's valid then push it to the base 64 result list.
«UMυ⁰⊞υ⊖⊟υ≔⁰θ»
Otherwise, set the base 64 result list to -1 (without changing its length) and clear the number of remaining bytes to process.
≔⁰ζ≔⁶⁴ε
Set the valid range for the remaining bytes. (This also sets the base to 64 for the output, necessary when the length of the list is greater than 1.)
»»I⟦↨υεLυ
Convert the base 64 list to decimal and also output its length.
JavaScript (Node.js), 165 bytes
([b,...c],t=b*4+c[0]/16-904,k=64)=>b<128?[1,b]:t<-120^t<0^t<2^t<54^t<56^~~t==64^t<81?c.some((e,i)=>y=(e^=128)<64?(b=b-k<<6|e,k*=32,k>b&&[i+2,b]):[i+1],b-=128)&&y:[1]
- If ASCII, take 1 byte
- Assuming byte 2 in [0x80,0xc0), if first two bytes bad, reject 1 byte
- Scan through from byte 2. If out of [0x80,0xc0), reject till here
- If enough bytes, take till here