| Bytes | Lang | Time | Link |
|---|---|---|---|
| 136 | Python 3 | 251010T054539Z | Random D |
| 114 | Python | 251010T182813Z | 97.100.9 |
| 072 | Red | 251010T071853Z | Galen Iv |
| 318 | Java | 250929T153647Z | Mark |
| 025 | Vyxal 3 | 240214T185323Z | pacman25 |
| 067 | JavaScript Node.js | 240306T062244Z | l4m2 |
| nan | Scala 2 | 240215T112740Z | 138 Aspe |
| 040 | Perl 5 n | 240214T200506Z | Xcali |
| 057 | MATL | 160117T035119Z | Luis Men |
| 185 | Python 2 | 160116T031741Z | TanMath |
| 062 | Ruby | 160116T230351Z | Flambino |
| 069 | JavaScript | 160116T230344Z | Benjamin |
| 030 | Retina | 160116T020205Z | Martin E |
| 112 | Haskell | 160116T014918Z | nimi |
Python 3 143 145 136 bytes
Thanks to @97.100.97.109 for finding an error
Edit: Removed a few bytes. Inspired by @97.100.97.109's use of the walrus operator.
def x(y):
z=y.find("AUG")
if z<0:return[]
return [a for i in range(z,len(y),3)if len(a:=y[i:i+3])==3 and y[i:i+3]not in"UAA UAG UGA"]
Ungolfed:
def parse(rna):
rna_start = rna.find('AUG')
codons = []
if rna_start == -1:
return []
for i in range(rna_start, len(rna), 3):
if codons[-1] in ('UAA', 'UAG', 'UGA'):
return codons
if len(rna[i:i+3]) == 3:
codons.append(rna[i:i+3])
return codons
I essentially just combined the two conditionals and put it in list comprehension. Other than that, I just renamed and shortened a condition.
Python, 114 bytes
def x(y):
for i in range(z:=y.find("AUG"),(len(y)-z)//3*3+z,3):
if(c:=y[i:i+3])in"UAA UAG UGA":break
print(c)
Heavily modified version of @Random Dude's code which fixes the error causing incorrect outputs. This version outputs the codons to stdout as opposed to returning them. If you prefer your code to be functional rather than imperative, here's an alternative:
Python, 117 bytes
def x(y,q=0):z=y.find("AUG");return[c for i in range(z,(len(y)-z)//3*3+z,3)if(q:=((c:=y[i:i+3])in"UAA UAG UGA")+q)<1]
Red, 72 bytes
func[b][parse b[collect[to"AUG"any[not["UAA"|"UAG"|"UGA"]keep 3 skip]]]]
Java, 318
String p(String v){int s=v.indexOf("AUG");List<String>g=new ArrayList<>();if(s==-1)return"";Matcher m=Pattern.compile(".{1,3}").matcher(v.substring(s));while(m.find()){if(m.group().length()<3||m.group().equals("UAA")||m.group().equals("UAG")||m.group().equals("UGA"))break;g.add(m.group());}return String.join(",",g);}
String p(String v) {
int s = v.indexOf("AUG");
List<String> g = new ArrayList<>();
if (s == -1) return "";
Matcher m = Pattern.compile(".{1,3}").matcher(v.substring(s));
while (m.find()) {
if (m.group().length() < 3 || m.group().equals("UAA") || m.group().equals("UAG") || m.group().equals("UGA"))
break;
g.add(m.group());
}
return String.join(",", g);
}
Vyxal 3, 25 bytes
"ᶠx„ẋİ⁻/3Ŀ:ƛ'u"\ᵇ„o+c]Ṙƒh
There's my 26 25 byter
old explanation:
"ᶠx„ẋİ⁻/3Ŀ:ƛ'u"\ᵇ„o+=a]Ṙƒh
"ᶠx„ẋ # The first index of "aug"
İ # Slice from here to the end
⁻ # Split into parts of length 3
/3Ŀ: # Keep only those whose length is 3 and duplicate
ƛ =a] # for each codon, does it equal any of...
o # overlapping pairs of....
"\ᵇ„ # The string "aaga" --> ["aa", "ag", "ga"]
'u # with a "u" prepended to each ["uaa", "uag", "uga"]
Ṙƒh # partition before truthy indices, take the first item.
💎
Created with the help of Luminespire.
JavaScript (Node.js), 67 bytes
s=>[/AUG(...)*?(?=UA[AG]|UGA|.?.?$)|$/.exec(s)[0].match(/.../g)]+''
Scala 2, 170 156 bytes
A port of @nimi's Haskell answer in Scala.
Saved 14 bytes thanks to @pacman256
Golfed version. Attempt This Online!
s=>{val q=s.indexOf("AUG");if(q>=0){val c=s.drop(q).grouped(3).toSeq;c.takeWhile(c=>c.size==3&& !Seq("UAA","UAG","UGA").contains(c))}else Seq.empty[String]}
Ungolfed version. Attempt This Online!
object RNASequenceProcessor {
def main(args: Array[String]): Unit = {
val rnaSequence = "AUGCUUAUGAAUGGCAUGUACUAAUAGACUCACUUAAGCGGUGAUGAA"
val codingRegion = findCodingRegion(rnaSequence)
println("["++codingRegion.mkString(",")++"]")
}
def findCodingRegion(sequence: String): Seq[String] = {
// Find the start of the coding region (first occurrence of "AUG")
val startOfCoding = sequence.indexOf("AUG")
if (startOfCoding != -1) {
// Extract the sequence from "AUG" onward
val codingSequence = sequence.drop(startOfCoding)
// Split the sequence into codons (chunks of 3 nucleotides)
val codons = codingSequence.grouped(3).toSeq
// Take codons until a stop codon is encountered or the codon length is less than 3
codons.takeWhile(codon => codon.length == 3 && !Seq("UAA", "UAG", "UGA").contains(codon))
} else {
Seq.empty[String] // Return an empty sequence if "AUG" is not found
}
}
}
MATL, 57 bytes
j'AUG(...)*?(?=(UAA|UAG|UGA|.?.?$))'XXtn?1X)tnt3\-:)3[]e!
This uses current version (9.3.1) of the language/compiler.
Input and output are through stdin and stdout. The output is separated by linebreaks.
Example
>> matl
> j'AUG(...)*?(?=(UAA|UAG|UGA|.?.?$))'XXtn?1X)tnt3\-:)3[]e!
>
> ACAUGGAUGGACUGUAACCCCAUGC
AUG
GAU
GGA
CUG
EDIT (June 12, 2016): to adapt to changes in the language, [] should be removed. The link below includes that modification
Explanation
The code is based on the regular expression
AUG(...)*?(?=(UAA|UAG|UGA|.?.?$))
This matches substrings starting with AUG, containing groups of three characters (...) and ending in either UAA, UAG, or UGA; or ending at the end of the string, and in this case there may be one last incomplete group (.?.?$). Lookahead ((?=...)) is used so that the stop codons are not part of the match. The matching is lazy (*?) in order to finish at the first stop codon found, if any.
j % input string
'AUG(...)*?(?=(UAA|UAG|UGA|.?.?$))' % regex
XX % apply it. Push cell array of matched substrings
tn? % if non-empty
1X) % get first substring
tnt3\-:) % make length the largest possible multiple of 3
3[]e! % reshape into rows of 3 columns
% implicit endif
% implicit display
Python 2, 185 bytes
i=input()
o=[]
if i.find('AUG')>=0:i=map(''.join,zip(*[iter(i[i.find('AUG'):])]*3))
else:print "";exit()
for j in i:
if j not in['UGA','UAA','UAG']:o+=[j]
else:break
print ','.join(o)
Explanation
Set i to input. Split it from 'AUG' to the end. Split into strings of three. Check if stop codon, and cut.
Ruby, 97 95 78 75 62 bytes
->(r){r.scan(/AUG|\B\G.../).join(?,).sub(/,U(AA|AG|GA).*/,'')}
I don't golf much, so I'm sure it can be improved.
Edit: Stole Borrowed Martin Büttner's excellent \B\G trick
JavaScript 88 82 70 69 chars
s=>/AUG(...)+?(?=(U(AA|AG|GA)|$))/.exec(s)[0].match(/.../g).join(",")
Usage Example:
(s=>/AUG(...)+?(?=(U(AA|AG|GA)|$))/.exec(s)[0].match(/.../g).join(","))("ACAUGGAUGGACUGUAACCCCAUGC")
Retina, 39 38 32 30 bytes
M!`AUG|\B\G...
U(AA|AG|GA)\D*
The trailing linefeed is significant.
Output as a linefeed-separated list.
Explanation
M!`AUG|\B\G...
This is match stage which turns the input into a linefeed-separated list of all matches (due to the !). The regex itself matches every codon starting from the first AUG. We achieve this with two separate options. AUG matches unconditionally, so that it can start the list of matches. The second match can be any codon (... matches any three characters), but the \G is a special anchor which ensures that this can only match right after another match. The only problem is that \G also matches at the beginning of the string, which we don't want. Since the input consists only of word characters, we use \B (any position that is not a word boundary) to ensure that this match is not used at the beginning of the input.
U(AA|AG|GA)\D*
This finds the first stop codon, matched as U(AA|AG|GA) as well as everything after it and removes it from the string. Since the first stage split the codons into separate lines, we know that this match is properly aligned with the start codon. We use \D (non-digits) to match any character, since . wouldn't go past the linefeeds, and the input won't contain digits.
Haskell, 115 112 bytes
import Data.Lists
fst.break(\e->elem e["UAA","UAG","UGA"]||length e<3).chunksOf 3.snd.spanList((/="AUG").take 3)
Usage example:
*Main> ( fst.break(\e->elem e["UAA","UAG","UGA"]||length e<3).chunksOf 3.snd.spanList((/="AUG").take 3) ) "AUGCUUAUGAAUGGCAUGUACUAAUAGACUCACUUAAGCGGUGAUGAA"
["AUG","CUU","AUG","AAU","GGC","AUG","UAC"]
How it works:
spanList((/="AUG").take 3) -- split input at the first "AUG"
snd -- take 2nd part ("AUG" + rest)
chunksOf 3 -- split into 3 element lists
fst.break(\e-> -- take elements from this list
elem e["UAA","UAG","UGA"]|| -- as long as we don't see end codons
length e<3) -- or run out of full codons