Don't reinvent the wheel !
Updated: May 24
Reinvent the wheel : to waste time trying to create something that someone else has already created.
Recently, I walked to my colleague's working area in the room next to mine and I casually inquired about what he is up to. He told me that he is pre-processing some data and for that he was writing a code snippet to extract some sequences from human genome fasta file based on headers. I was stunned; mouth wide open..! Who writes a code for that ? ; I said to my self. This has been attempted not less than ten thousand times on earth.
The next second, "self realization". This is exactly you have been doing when you had started doing bioinformatics (inner voice to me). Hence, I acquitted my colleague silently.
The problem with knowing how to program a computer is that we often start doing it without thinking a second. About 90 % of the small tasks are already solved. Some of the daily tasks have a read-made solution; a super-fast and intuitive tool just waiting for us to be installed and to be used. But, what we choose is to write our own "FASTA manipulator".
There are 2 fantastic ways to avoid such incidents.
1. A simple google search
Google is your best friend. Just google the task you are about to start - who knowns somone already has an answer. Biostars.org is a great place to look. Use the search "site" feature like this:
2. Get hold on a well established tool
For simple tasks like fasta file manipulation, editing or extracting something from tabular or CSV files, use the well established tools already available. I never write anything myself to manipulate fasta files. I just use this tool called seqkit. Have a look at this example:
Here I have a file called headers.txt having the headers of the fasta file I want to extract from hairpin.fa.gz. And the below command gives me that in just about a second !
zcat hairpin.fa.gz | seqkit grep -f headers.txt > new.fa
There are thousands of such tools out there. Pick you favorite and never look back.
That being said, we often over complicate utterly simple tasks ( re-inventing the wheel? ) when it is not at all required. While it's tempting to re-invent the wheel; and you can do it; however, do it if the wheel is not round and still rolls like it is.
Just because you can do it, doesn't mean you should do it !