g | x | w | all
Bytes Lang Time Link
449GNU sed E240831T145818Z鳴神裁四点一号
200Charcoal240831T210441ZNeil
911Python3240830T173518ZAjax1234

GNU sed -E, 461 457 449 bytes

s/^/;0,q/
:q
s/q/qq/g
t3
:3
s/(,\w+)(,\w+,\w+)(.*)/\1,q\3;q\2/
tq
s/(.*)(;(q+)z)/\3\1/
:4
s/^(q+)\>(.*);((\w+)(,\1(,\w+)|(,\w+),\1))\>/\1\2%\3;\4\6\7/
t4
s/^(q+)\>(.*);((\w+),\1)\>/\1\2%\3;\4z/
t4
s/^q+//
y/%/;/
t3
s/;(q+,q+,\w+)/<\1/g
s/(.*);(q+,q+)/\2\1/
:g
s/^((q+),(q+))\>(.*);(\3(,\w+(,\w+)?))/\1;\2\6\4<\5/
tg
s/^q+,q+//
y/</;/
t3
s/,(a+,)/%\1/g
s/(,q+),a/\1%a/g
s/(.*)(%a+)/q\2&/
:s
s/^q(%a+)\>(.*)\1\>/q\1\2,q/
ts
s/^q%(a+)/;q,\1/
s/%/,/g
tq

Try it online!

Usage

One-line input. Unary state and unary alphabet.

Input := Rule*
Rule := ";" State RewriteList
State := "q"+
RewriteList := ( "," ( State | Alphabet) )+ | Empty
Alphabet:= "a"+
Empty := "z"

One letter state "q" considered to be initial state of input; the output additionally uses "0" to represent initial.

For example if you have the language

S -> AbB | C
C -> b | c
B -> AA | AC
A -> a | ε

you need to translate into this:

;q,qq,aa,qqq;q,qqqq;qqqq,aa;qqqq,aaa;qqq,qq,qq;qqq,qq,qqqq;qq,a;qqz

States S, A, B, C are translated into q, qq, qqq, qqqq respectively; similar for alphabets a, b, c.

Ungolfed 461 bytes answer

I tried with in order of step 1, 3, 4, 5, 2.

Reading hint: I named alphabet labels taken from the qwerty keyboard;

# step 1
s/^/;0,q/

# rename every state; new state q can be introduced
:q
# s/\<q/qq/g # readable alternative
s/q/qq/g
t3

# step 3
:3
s/(,\w+)(,\w+,\w+)(.*)/\1,q\3;q\2/
tq

:r
# step 4
# choose a nullable state
s/(.*)(;(q+)z)/\3\1/
Tw

# add nullable state rules
:4
s/^(q+)\>(.*);((\w+)(,\1(,\w+)|(,\w+),\1))\>/\1\2%\3;\4\6\7/
t4
s/^(q+)\>(.*);((\w+),\1)\>/\1\2%\3;\4z/
t4

# step 4 done for that state
s/q+//
y/%/;/
tr

# step 5
# protect every A -> BC
:w
s/;(q+,q+,\w+)/<\1/g

# choose one A -> B
s/(.*);(q+,q+)/\2\1/

# no need to unprotect A -> BC
# if A -> B and B -> ... add A -> ...
:g
s/^((q+),(q+))\>(.*);(\3(,\w+(,\w+)?))/\1;\2\6\4<\5/
tg

# remove A -> B
s/^q+,q+//

# end of step 5 or just removed A -> B
y/</;/
tw

# step 2
# mark every alphabet a in A -> Ba and A -> aB
s/,(a+,)/%\1/g
s/(,q+),(a+)/\1%\2/g

# choose one alphabet
s/(.*)(%a+)/q\2&/

# now rewrite rules
:s
s/^q(%a+)\>(.*)\1\>/q\1\2,q/
ts
s/^q%(a+)/;q,\1/

# still having any?
s/%/,/g
tq # to be able to get more states

# COMPLETED!
# some rules are duplicated but whatever

Try it online!

Charcoal, 200 bytes

UMθEιEλ∨νLθ⊞θ§θ⁰§≔θ⁰⟦⟦⊖Lθ⟧⟧W⌊EΦEθΦκ∧›Lμ¹‹⌊μ⁰κ⌊Eκ⌊μ«UMθEκEμ⎇⁼ξιLθξ⊞θ⟦⟦ι⟧⟧»W⌊ΦEθ⌊Φκ›Lμ²κ«⊞θ⟦⮌E²⊟ι⟧⊞ι⊖Lθ»WΦ⁻Eθλυ⊙§θκ¬⁻μυFι⊞υκUMθΦιλF№υ⁰⊞§θ⁰⟦⟧FθFΦι⁼Lκ²FΦ²№υ§κλ⊞ιΦκ⁻νλW⌊ΣEθΦκ∧⁼Lμ¹›⌊μ⁰UMθ⁺⁻κ⟦ι⟧⎇№κι§θ⌊ι⟦⟧⭆¹θ

Attempt This Online! Link is to verbose version of code. Takes input as an array of arrays of rules where index 0 represents the start symbol and rules contain positive integers for nonterminals and negative integers for terminals. Explanation:

UMθEιEλ∨νLθ⊞θ§θ⁰§≔θ⁰⟦⟦⊖Lθ⟧⟧

Renumber the start symbol and add a production from 0 to the new symbol.

W⌊EΦEθΦκ∧›Lμ¹‹⌊μ⁰κ⌊Eκ⌊μ

While rules exist that contain terminals but aren't of normal form, ...

«UMθEκEμ⎇⁼ξιLθξ⊞θ⟦⟦ι⟧⟧»

... add a new nonterminal with a production to the terminal and rename all uses of that terminal to the new nonterminal.

W⌊ΦEθ⌊Φκ›Lμ²κ

While a rule exists with more than two nonterminals...

«⊞θ⟦⮌E²⊟ι⟧⊞ι⊖Lθ»

... add a new nonterminal for the last two nonterminals in that rule and update the rule to use that nonterminal instead.

WΦ⁻Eθλυ⊙§θκ¬⁻μυFι⊞υκ

Determine which nonterminals are nullable.

UMθΦιλ

Remove all empty productions.

F№υ⁰⊞§θ⁰⟦⟧

If the start symbol is nullable then add in an empty production.

FθFΦι⁼Lκ²FΦ²№υ§κλ⊞ιΦκ⁻νλ

For each rule that contains a nullable nonterminal, add a rule with that nonterminal removed.

W⌊ΣEθΦκ∧⁼Lμ¹›⌊μ⁰

While a unit rule exists...

UMθ⁺⁻κ⟦ι⟧⎇№κι§θ⌊ι⟦⟧

... update all of the productions that include it to include the rules for its nonterminal instead.

⭆¹θ

Output the final list of productions.

The given example

S -> AbB | C
A -> a | ε
B -> AA | AC
C -> b | c

is represented by the following input array:

[
 [[1, -2, 2], [3]],
 [[-1], []],
 [[1, 1], [1, 3]],
 [[-2], [-3]]]
]

The resulting normal form is

[
 [[1, 6], [-3], [-2], [5, 2], [-2]],
 [[-1]],
 [[1, 1], [1, 3], [-1], [-3], [-2]],
 [[-3], [-2]],
 [[1, 6], [-3], [-2], [5, 2], [-2]],
 [[-2]],
 [[5, 2], [-2]]
]

which can be interpreted as:

S -> AF | c | b | EB | b
A -> a
B -> AA | AC | a | c | b
C -> c | b
D -> AF | c | b | EB | b
E -> b
F -> EB | b

Python3, 911 bytes

import string as S
from itertools import*
L=str.islower
H=lambda x:x.replace('S','T')
V=product
def P(T,G):
 q,s=[b for a,b in G if a==T],[]
 for i in q:
  if''==i:return 1
  for j in map(''.join,V(*[[t]if L(t)else[b for a,b in G if a==t]for t in i])):
   if j==''or not any({*k}=={*j}for k in s):q+=[j];s+=[j]
 return 0
def f(g):
 g=[(H(a),H(b))for a,b in g]
 s,d=[*{*S.ascii_uppercase}-{a for a,_ in g}-{'S'}],{}
 G,N=[],[]
 for a,b in g:
  if all(L(i)for i in b):G+=[(a,b)]
  else:
   n=''
   for i in b:
    if L(i):G+=[(t:=(d.get(i)or s.pop()),i)];d[i]=t;i=t
    n+=i
   N+=[(a,n)]
 for a,b in N:
  if len(b)<3:G+=[(a,b)];continue
  G+=[(a,b[0]+(i:=s.pop()))];N+=[(i,b[1:])]
 K=[]
 G={*G}
 for a,b in G:
  if''!=b:K+=[(a,''.join(j))for j in V(*[[t]+['']*P(t,G)for t in b])]
 F,T=[],[*{*G,*K}]
 for a,b in T:
  if len(b)==1 and not L(b):T+=[(a,B)for A,B in T if A==b]
  else:F+=[(a,b)]
 return [('S','T')]+F

Try it online!