40 years ago all of Genbank was published in print form by NAR. The same format today would require over 4 light seconds of shelf space. To a year of progress in 2025.
Comments
Log in with your Bluesky account to leave a comment
I was looking at the sequence for a virus I had been studying for 10 years, and the sequence was ALL WRONG and then I realized it has been entered into Genbank 3' to 5' so I write a polite letter and Genbank responds
Lovely! A little bit earlier was Margaret Dayhoff’s Atlas of Protein Sequence and Structure. It has all the sequences printed together 😉 and some lovely structures, the definition of mutation data matrices and sequence databases. 1st edition was 1965
...A groundbreaking book that was right at the beginning of what we now call bioinformatics. Many of the concepts are in daily use today often without realising…
..then of course there is Kabat's "Sequences of Immunological Interest" which started in 1970 I think? The 5th printed edition was published in 1991. I met Kabat at NCBI when I was a visiting scientist there in the early 1989/90. I used his computer on the days he was not in the office!
circa 1985 I was working at Columbia
I walk out of the lab into the hallway holding a rack of testubes (glass of course) full of E coli overnights and this guy ,maybe 5foot2, maybe 90 pounds
CHARGES down the corridor and knocks into me and says sorry ,gotta run & he's gone
"...beginning of what we now call bioinformatics."
in fact, at the very beginning there was a deeply felt and unprintable curse when it came to typing these nicely printed sequences into the computer, preferably without typos!
Yes, in the early days of NCBI they used to triple enter the DNA sequences printed in journals. They rapidly persuaded publishers to require deposition to EMBL/GenBank for publication which made this practice obsolete (thankfully!).
When I started graduate school at Caltech in 1980, Lee Hood had a senior graduate student who had written his own DNA & protein sequence analysis software — in BASIC — to run on some sort of physically gargantuan HP "nanocomputer." He spent the winter break rewriting it in FORTRAN...
The software was able to generate dot-matrix homology plots, printable on a very large format 4-color HP drafting printer. Given the complex inverted & tandem repetitions in my gene region, the output had mesmerizing patterns & made terrific wall art.
Rudimentary restriction maps and schematics could be drawn, but only after _hand-coding_ a plotter program that specified the x,y coordinates for every pen-drop and -lift and color. It was how you drew straight lines, boxes, other graphic primitives. (I had a deep appreciation for MacDraw in 1985!)
Hood also employed a 40hr/week research technician whose sole job duties were to go to the library, photocopy EVERY new sequence paper, and then hand-enter the data into the home-rolled file format used by the graduate student's software. At first she had slack time each week for glassware patrol.
When I described all this to a clever undergraduate in my own lab in 1995, he stroked his scraggly 19-year-old's beard and intoned, "I guess there had to be a time before GenBank."
Comments
I was looking at the sequence for a virus I had been studying for 10 years, and the sequence was ALL WRONG and then I realized it has been entered into Genbank 3' to 5' so I write a polite letter and Genbank responds
Our rule is that
turns out the original depositor for the 3' to 5' virus genome had passed away, so Genbank left it
I walk out of the lab into the hallway holding a rack of testubes (glass of course) full of E coli overnights and this guy ,maybe 5foot2, maybe 90 pounds
CHARGES down the corridor and knocks into me and says sorry ,gotta run & he's gone
:)
in fact, at the very beginning there was a deeply felt and unprintable curse when it came to typing these nicely printed sequences into the computer, preferably without typos!
The software was able to generate dot-matrix homology plots, printable on a very large format 4-color HP drafting printer. Given the complex inverted & tandem repetitions in my gene region, the output had mesmerizing patterns & made terrific wall art.
But he miscalculated by a factor of ten.
The Internet social group laughed pretty loudly at his error.
Two books; one closed with "Nucleotide Sequences 1984 Part 1" on the cover, and one open showing data-filled pages.
Transcribed Text:
NUCLEOTIDE SEQUENCES 1984 PART 1
A compilation from the GenBank and EMBL data libraries
A special supplement to Nucleic Acids Research
IRL PRESS
Published in print form, ha ha: love it.