g | x | w | all

Bytes	Lang	Time	Link
136	Python 3	251010T054539Z	Random D
114	Python	251010T182813Z	97.100.9
072	Red	251010T071853Z	Galen Iv
318	Java	250929T153647Z	Mark
025	Vyxal 3	240214T185323Z	pacman25
067	JavaScript Node.js	240306T062244Z	l4m2
nan	Scala 2	240215T112740Z	138 Aspe
040	Perl 5 n	240214T200506Z	Xcali
057	MATL	160117T035119Z	Luis Men
185	Python 2	160116T031741Z	TanMath
062	Ruby	160116T230351Z	Flambino
069	JavaScript	160116T230344Z	Benjamin
030	Retina	160116T020205Z	Martin E
112	Haskell	160116T014918Z	nimi

Python 3 143 145 136 bytes

Thanks to @97.100.97.109 for finding an error

Edit: Removed a few bytes. Inspired by @97.100.97.109's use of the walrus operator.

def x(y):
 z=y.find("AUG")
 if z<0:return[]
 return [a for i in range(z,len(y),3)if len(a:=y[i:i+3])==3 and y[i:i+3]not in"UAA UAG UGA"]

Ungolfed:

def parse(rna):
    rna_start = rna.find('AUG')
    codons = []

    if rna_start == -1:
        return []

    for i in range(rna_start, len(rna), 3):
        if codons[-1] in ('UAA', 'UAG', 'UGA'):
            return codons
        if len(rna[i:i+3]) == 3:
            codons.append(rna[i:i+3])
    return codons

I essentially just combined the two conditionals and put it in list comprehension. Other than that, I just renamed and shortened a condition.

Python, 114 bytes

def x(y):
 for i in range(z:=y.find("AUG"),(len(y)-z)//3*3+z,3):
  if(c:=y[i:i+3])in"UAA UAG UGA":break
  print(c)

Attempt This Online!

Heavily modified version of @Random Dude's code which fixes the error causing incorrect outputs. This version outputs the codons to stdout as opposed to returning them. If you prefer your code to be functional rather than imperative, here's an alternative:

Python, 117 bytes

def x(y,q=0):z=y.find("AUG");return[c for i in range(z,(len(y)-z)//3*3+z,3)if(q:=((c:=y[i:i+3])in"UAA UAG UGA")+q)<1]

Attempt This Online!

Red, 72 bytes

func[b][parse b[collect[to"AUG"any[not["UAA"|"UAG"|"UGA"]keep 3 skip]]]]

Try it online!

Java, 318

String p(String v){int s=v.indexOf("AUG");List<String>g=new ArrayList<>();if(s==-1)return"";Matcher m=Pattern.compile(".{1,3}").matcher(v.substring(s));while(m.find()){if(m.group().length()<3||m.group().equals("UAA")||m.group().equals("UAG")||m.group().equals("UGA"))break;g.add(m.group());}return String.join(",",g);}

String p(String v) {
    int s = v.indexOf("AUG");
    List<String> g = new ArrayList<>();
    if (s == -1) return "";
    Matcher m = Pattern.compile(".{1,3}").matcher(v.substring(s));
    while (m.find()) {
        if (m.group().length() < 3 || m.group().equals("UAA") || m.group().equals("UAG") || m.group().equals("UGA"))
            break;
        g.add(m.group());
    }
    return String.join(",", g);
}

Vyxal 3, 25 bytes

"ᶠx„ẋİ⁻/3Ŀ:ƛ'u"\ᵇ„o+c]Ṙƒh

Try it Online!

There's my 26 25 byter

old explanation:

"ᶠx„ẋİ⁻/3Ŀ:ƛ'u"\ᵇ„o+=a]Ṙƒh⁡‎‎⁪⁡⁪⁠⁪⁡⁪‏⁠‎⁪⁡⁪⁠⁪⁢⁪‏⁠‎⁪⁡⁪⁠⁪⁣⁪‏⁠‎⁪⁡⁪⁠⁪⁤⁪‏⁠‎⁪⁡⁪⁠⁪⁢⁡⁪‏⁠⁪⁪‏⁡⁠⁡‌⁢‎⁪⁪⁠⁪⁪⁠⁪⁪⁠⁪⁪⁠⁪⁪⁠‎⁪⁡⁪⁠⁪⁢⁢⁪‏‏⁡⁠⁡‌⁣‎⁪⁪⁠⁪⁪⁠⁪⁪⁠⁪⁪⁠‎⁪⁡⁪⁠⁪⁢⁣⁪‏⁠⁪⁪⁠⁪⁪⁠⁪⁪⁠⁪⁪‏⁡⁠⁡‌⁤‎‎⁪⁡⁪⁠⁪⁢⁤⁪‏⁠‎⁪⁡⁪⁠⁪⁣⁡⁪‏⁠‎⁪⁡⁪⁠⁪⁣⁢⁪‏⁠‎⁪⁡⁪⁠⁪⁣⁣⁪‏‏⁡⁠⁡‌⁢⁡‎‎⁪⁡⁪⁠⁪⁣⁤⁪‏⁠‎⁪⁡⁪⁠⁪⁢⁢⁡⁪‏⁠‎⁪⁡⁪⁠⁪⁢⁢⁢⁪‏⁠‎⁪⁡⁪⁠⁪⁢⁢⁣⁪‏‏⁡⁠⁡‌⁢⁢‎‎⁪⁡⁪⁠⁪⁢⁡⁣⁪‏‏⁡⁠⁡‌⁢⁣‎‎⁪⁡⁪⁠⁪⁤⁣⁪‏⁠‎⁪⁡⁪⁠⁪⁤⁤⁪‏⁠‎⁪⁡⁪⁠⁪⁢⁡⁡⁪‏⁠‎⁪⁡⁪⁠⁪⁢⁡⁢⁪‏⁠⁪⁪‏⁡⁠⁡‌⁢⁤‎‎⁪⁡⁪⁠⁪⁤⁡⁪‏⁠‎⁪⁡⁪⁠⁪⁤⁢⁪‏‏⁡⁠⁡‌⁣⁡‎‎⁪⁡⁪⁠⁪⁢⁢⁤⁪‏⁠‎⁪⁡⁪⁠⁪⁢⁣⁡⁪‏⁠‎⁪⁡⁪⁠⁪⁢⁣⁢⁪‏‏⁡⁠⁡‌
"ᶠx„ẋ                       # ‎⁡The first index of "aug"
     İ                      # ‎⁢Slice from here to the end
      ⁻                     # ‎⁣Split into parts of length 3
       /3Ŀ:                 # ‎⁤Keep only those whose length is 3 and duplicate
           ƛ        =a]     # ‎⁢⁡for each codon, does it equal any of...
                  o         # ‎⁢⁢overlapping pairs of....
              "\ᵇ„          # ‎⁢⁣The string "aaga" --> ["aa", "ag", "ga"]
            'u              # ‎⁢⁤with a "u" prepended to each ["uaa", "uag", "uga"]
                       Ṙƒh  # ‎⁣⁡partition before truthy indices, take the first item.
💎

Created with the help of Luminespire.

JavaScript (Node.js), 67 bytes

s=>[/AUG(...)*?(?=UA[AG]|UGA|.?.?$)|$/.exec(s)[0].match(/.../g)]+''

Try it online!

Scala 2, 170 156 bytes

A port of @nimi's Haskell answer in Scala.

Saved 14 bytes thanks to @pacman256

Golfed version. Attempt This Online!

s=>{val q=s.indexOf("AUG");if(q>=0){val c=s.drop(q).grouped(3).toSeq;c.takeWhile(c=>c.size==3&& !Seq("UAA","UAG","UGA").contains(c))}else Seq.empty[String]}

Ungolfed version. Attempt This Online!

object RNASequenceProcessor {

  def main(args: Array[String]): Unit = {
    val rnaSequence = "AUGCUUAUGAAUGGCAUGUACUAAUAGACUCACUUAAGCGGUGAUGAA"
    val codingRegion = findCodingRegion(rnaSequence)
    println("["++codingRegion.mkString(",")++"]")
  }

  def findCodingRegion(sequence: String): Seq[String] = {
    // Find the start of the coding region (first occurrence of "AUG")
    val startOfCoding = sequence.indexOf("AUG")
    if (startOfCoding != -1) {
      // Extract the sequence from "AUG" onward
      val codingSequence = sequence.drop(startOfCoding)
      
      // Split the sequence into codons (chunks of 3 nucleotides)
      val codons = codingSequence.grouped(3).toSeq
      
      // Take codons until a stop codon is encountered or the codon length is less than 3
      codons.takeWhile(codon => codon.length == 3 && !Seq("UAA", "UAG", "UGA").contains(codon))
    } else {
      Seq.empty[String] // Return an empty sequence if "AUG" is not found
    }
  }
}

Perl 5 `-n`, 40 bytes

map/UAA|UAG|UGA/?last:say,/AUG|\B\G.../g

Try it online!

MATL, 57 bytes

j'AUG(...)*?(?=(UAA|UAG|UGA|.?.?$))'XXtn?1X)tnt3\-:)3[]e!

This uses current version (9.3.1) of the language/compiler.

Input and output are through stdin and stdout. The output is separated by linebreaks.

Example

>> matl
 > j'AUG(...)*?(?=(UAA|UAG|UGA|.?.?$))'XXtn?1X)tnt3\-:)3[]e!
 >
> ACAUGGAUGGACUGUAACCCCAUGC
AUG
GAU
GGA
CUG

EDIT (June 12, 2016): to adapt to changes in the language, [] should be removed. The link below includes that modification

Try it online!

Explanation

The code is based on the regular expression

AUG(...)*?(?=(UAA|UAG|UGA|.?.?$))

This matches substrings starting with AUG, containing groups of three characters (...) and ending in either UAA, UAG, or UGA; or ending at the end of the string, and in this case there may be one last incomplete group (.?.?$). Lookahead ((?=...)) is used so that the stop codons are not part of the match. The matching is lazy (*?) in order to finish at the first stop codon found, if any.

j                                     % input string
'AUG(...)*?(?=(UAA|UAG|UGA|.?.?$))'   % regex
XX                                    % apply it. Push cell array of matched substrings
tn?                                   % if non-empty
1X)                                   % get first substring
tnt3\-:)                              % make length the largest possible multiple of 3
3[]e!                                 % reshape into rows of 3 columns
                                      % implicit endif
                                      % implicit display

Python 2, 185 bytes

i=input()
o=[]
if i.find('AUG')>=0:i=map(''.join,zip(*[iter(i[i.find('AUG'):])]*3))
else:print "";exit()
for j in i:
 if j not in['UGA','UAA','UAG']:o+=[j]
 else:break
print ','.join(o)

Explanation Set i to input. Split it from 'AUG' to the end. Split into strings of three. Check if stop codon, and cut.

Try it here

Ruby, 97 95 78 75 62 bytes

->(r){r.scan(/AUG|\B\G.../).join(?,).sub(/,U(AA|AG|GA).*/,'')}

I don't golf much, so I'm sure it can be improved.

Edit: ~~Stole~~ Borrowed Martin Büttner's excellent \B\G trick

JavaScript 88 82 70 69 chars

s=>/AUG(...)+?(?=(U(AA|AG|GA)|$))/.exec(s)[0].match(/.../g).join(",")

Usage Example:

(s=>/AUG(...)+?(?=(U(AA|AG|GA)|$))/.exec(s)[0].match(/.../g).join(","))("ACAUGGAUGGACUGUAACCCCAUGC")

Retina, 39 38 32 30 bytes

M!`AUG|\B\G...
U(AA|AG|GA)\D*

The trailing linefeed is significant.

Output as a linefeed-separated list.

Try it online.

Explanation

M!`AUG|\B\G...

This is match stage which turns the input into a linefeed-separated list of all matches (due to the !). The regex itself matches every codon starting from the first AUG. We achieve this with two separate options. AUG matches unconditionally, so that it can start the list of matches. The second match can be any codon (... matches any three characters), but the \G is a special anchor which ensures that this can only match right after another match. The only problem is that \G also matches at the beginning of the string, which we don't want. Since the input consists only of word characters, we use \B (any position that is not a word boundary) to ensure that this match is not used at the beginning of the input.

U(AA|AG|GA)\D*

This finds the first stop codon, matched as U(AA|AG|GA) as well as everything after it and removes it from the string. Since the first stage split the codons into separate lines, we know that this match is properly aligned with the start codon. We use \D (non-digits) to match any character, since . wouldn't go past the linefeeds, and the input won't contain digits.

Haskell, 115 112 bytes

import Data.Lists
fst.break(\e->elem e["UAA","UAG","UGA"]||length e<3).chunksOf 3.snd.spanList((/="AUG").take 3)

Usage example:

*Main> ( fst.break(\e->elem e["UAA","UAG","UGA"]||length e<3).chunksOf 3.snd.spanList((/="AUG").take 3) ) "AUGCUUAUGAAUGGCAUGUACUAAUAGACUCACUUAAGCGGUGAUGAA"
["AUG","CUU","AUG","AAU","GGC","AUG","UAC"]

How it works:

                spanList((/="AUG").take 3)  -- split input at the first "AUG"
             snd                            -- take 2nd part ("AUG" + rest)
     chunksOf 3                             -- split into 3 element lists
fst.break(\e->                              -- take elements from this list
           elem e["UAA","UAG","UGA"]||      -- as long as we don't see end codons
           length e<3)                      -- or run out of full codons

Python 3 143 145 136 bytes

Python, 114 bytes

Python, 117 bytes

Red, 72 bytes

Java, 318

Vyxal 3, 25 bytes

JavaScript (Node.js), 67 bytes

Scala 2, 170 156 bytes

Perl 5 -n, 40 bytes

MATL, 57 bytes

Example

Explanation

Python 2, 185 bytes

Ruby, 97 95 78 75 62 bytes

JavaScript 88 82 70 69 chars

Retina, 39 38 32 30 bytes

Explanation

Haskell, 115 112 bytes

Perl 5 `-n`, 40 bytes