It all began when I ran across DriveThruComics … or, to be fair, when I long ago begun reading Phil and Kaja Foglio’s «Girl Genius»1 comic. At DriveThru I discovered the delight that is buying the entire series anew …
Just kidding, of course.
DriveThru sell comics in digital form — watermarked PDFs to be precise. One of them is the above mentioned Foglio production; a series I enjoy so much I found myself quite amendable to the idea of actually bringing with me copies to re–read without risking them being more damaged2 or lost.
As my company has an iPad 2 in the lab, and as I am the company, I bought a copy of volume One3, stressed my way through iTunes, and tried out my first digital comic. It was luverly.
Oh, the iPad is a crap e–book reader, and makes an equally unwieldy, heavy and slippery comic–book reader, but the idea, now, that was nice – and there are other tablets out there.
I have, over forty–three sinful years, amassed quite a few albums; several of which are very, very VERY unlikely to be re–released. Some were never meant to be collected6, and acid free paper in 1977 was a broadsheet without any drug related stories. Some colours have faded, some that were not present at start are there now7, and, admittedly, some issues are rather worn from repeated reading. Scanning will help preserve them.
No sooner had the idea landed that I grabbed an old album (the Swedish edition of Franquin’s 1957 «Le rèpere de la murène» — «Vrakmysteriet» from 1975) in bad shape, and tossed8 it in the scanner. The result was abysmal — nay, much worse. I backtracked, and established some rules:
- The album must be digitised as–is. No. Cutting. Anywhere.
- Quality is paramount. Most everything else is secondary.
- Cost is secondary, but not that secondary.
- If possible, colours should be restored.
- The project is due before I retire — that is I have no more than twenty–two years to do it in.
- Two copies should be made: one high–quality, high resolution for archival, one PDF for reading.
- Archive copies should be loss–less, if compressed, data.
- The final, digital, album must be of sufficient resolution for future usage.
Work done by UNESCO or the US Library of Congress is impressive, but takes time and is often costly. Professional equipment always is.
Yet the first step in any process I could devise would, naturally, be the collection of data. To get a printed comic album into a computer, I must either scan it, or photograph it.
The easy, and reasonably cheap, way is to use a flatbed scanner, of which I am the happy owner. Purchasing a dedicated unit is outside the budget for such a project as this.
The first flatbed scan came out skewed, for obvious — with hindsight – reasons: as the album is placed face down on the scanner, and the lid closed, the resulting movement twist the page. Keeping the lid up is not a solution, as this will add too much light.
A second attempt was made, taking extra care to place the album, and adding a weight to the lid. This resulted in a certain improvement to both edges and quality. Adding a sheet of black cardboard behind the page to be scanned helped further.
Both tests showed very clearly that the method was not feasible in the long run. From start to finish a page took approximately eight minutes to complete, due to the complicated ways I had to use in order to get the page flat and unharmed against the glass.
Since all this messing about increase the possibility of breaking both Rule(s) Number 1 — see above — and 511 I decided not to travel down this route.
Update … might have to get back to you on this one. My tests have, in the past, been done on an Epson Perfection 4990. Currently looking at replacing an aging SnapScan E20 with a CanoScan LiDE 110. This may be a more fruitful path.
Dedicated units with a narrow frame exist — have a peek at the plustek OpticBook series if you fancy. They tend to break Rule Number 3.
Another resource recently found is the lot over at DIY Book Scanner. Take two, and don’t call me in the morning.
The first camera experiment used the Samsung Galaxy Note; these images were never meant to be used. A cell phone camera has the advantage of low weight, yet produce poor quality.
Next up was the Canon 550D, an entirely different beast — excellent quality, but impossible to hold still for a macro shot — unless, of course, in «auto» mode, complete with flash and auto white balance. A sample ISO 400 shot provided quite generic, and very noisy, results.
For better quality, I decided on ISO 100, custom white balance per album, long exposure and — hardly surprising — a tripod. The latter proved a problem: a regular run–of–the–mill tripod is designed to hold the camera for shooting horizontally. Duh.
I made a few attempts at positioning the album vertically, but quickly concluded that creating a rig for this involved serious carpentry: it required a stand which could not only hold the album but keep it open. Let’s not even think about the light required.
Luckily, one could say, our old Lotus camera tripod had a crack. Its plastic, and has provided years of good service — yet retirement is goood, preferably before SOs fine DSLR comes crashing down.
Via some fancy footwork on google, a highly competent semi–professional friend12 and a firm grip on the above test results, I found the Manfrotto 190XPROB, the lil’ brother of said friends 055 ditto. Among its many nice functions is the ability to tilt the centre column into a horizontal position, thereby giving me the chance to point the camera safely downwards.
And downwards open up a host of possibilities: overhead lighting, gravity helping to keep pages open, the use of flat, easily assembled stand–alone surfaces13, and stability.
The tripod is covered in Rule Number 3, and you can likely find a good combination deal with a head such as the 486RC2. Of course, I do cheat: I already have the camera14 :)
For the preliminary tests I’ve picked a different album; one in no danger of falling apart, and sufficiently not–well–known to not create a furore if anyone ever read this.
In short: Serge Rosenzweig and Bernard Dufossé’s 1983 «Les Cinq et le mystère de Roquépine», based on the book by Claude Voilier; a story Enid Blyton never wrote, but with characters, places and names she created. It was released by A/S Hjemmet Bladforlaget in 1984 as «De 5 på eventyr nr. 3: Tempel–Riddernes hemmelighet».
A Norwegian translation of a French comic based on an English children’s books series published in 1942. It’s about as obscure as you can get, in other words.
Let’s pause for a moment, and backtrack to Rule Number Two: I want high quality above (almost) everything else. The initial reaction is, of course, then to scan the comic on the highest resolution15 possible.
This turns out to be a Bad Idea: «quality» is a wibbly–wobbly kind of issue, with more pitfalls than US tax law.
The current maximum achievable by my scanner is 4800dpi at 24–bit colour. Such a scan take up to a quarter of an hour, the resulting file size is prohibitive — not for required storage space, but for the memory and processing power required to, for example, colour correct such a massive amount of data — and the end result isn’t particularly good.
«WHAT?» I hear you scream. Let me illustrate. The following scans are from a Donald Duck comic, no. 27 1979, published on the 3rd of July. They are included as PNG, to avoid the added complication of JPEG artifacts. That is a joy we’ll deal with later.
To start with, we have — in order — a crop from a 4800dpi scan, at 100% zoom, and a crop from a 600dpi scan, at 1100% zoom (to achieve roughly the same sized view–port):
You’ll notice that the difference in quality is quite clear.
However, I’m not about to dump raw scans onto a tablet. It is not practical to, say, put a 13366 by 11092 scan on a 1024x768 screen and hope the device CPU is up to a suitable rescale on its own.
So here are the two scans again, uncropped, but resized to 500x415:
The results this time may not be so easy to tell apart … for reference, the original scans were:
- 4800 dpi, 24 bit, 13366 x 11092, 425 mb TIFF
- 600 dpi, 24 bit, 1670x1386, 6.7 mb TIFF
So, yes, basically the 4800 dpi scan is not particular better than the 600 dpi one, when it comes down to it. I’ve concluded that, for reasons of size, memory, processing time, and quality there is little point in exceeding the latter resolution. Which brings us back to the Canon 550D.
A «normal», as if such a thing exists, album is roughly A4 or smaller in size — 210 × 297mm or 8.27 × 11.69 inches. Assuming we want to shoot the page as close to 1:1 as possible, and calculating with the Canon’s 18MP resolution we’ll get a DPI of ~525.
Home free. As if. In conclusion, however: the resulting image from a Canon 550D will be of high enough quality when viewed on a low–resolution tablet.
The nitty–gritty is getting gritty indeed, now. As the previous analysis showed, an 18MP digital camera is more than able to provide data for the project.
But, yet again, storing comics on a tablet in full size is just a waste of time — and storage. So while we’ll capture in as high as possible a quality, we’ll store more realistically.
At the time of writing, the maximum resolution of a tablet is the iPad (3) with 2048x1536 pixels on 9.7 inches — 264 dpi.
Next up — or down — is 1920x1200 on machines such as the Asus Infinity 700. On a 10.1 inch screen we get 224 dpi.
Following that is the 1280x800 of the Galaxy Note 10.1 — 141dpi. This is possibly the most common resolution on Android tablets today.
Let’s look at the tablet we do have: the iPad 2. With 1024x768 pixels on 9.7 inches, this device sports 132 dpi. That, however, is quite acceptable for comics reading, much to our surprise. Opens up for the Samsung Tab 2 7.0 (170dpi, and (!) 345g) too.
For other reasons (weight, slipperiness, locked–down–ity, truly complicated and stressful file management), the iPad is not on the menu. But the Asus might be; it’s a different question for a different time.
This question is about final resolution vs. quality vs. file size vs. … For example: a test scan of an EPIX (No 3, 1984) album where the raw TIF data was 4663 by 6441 and 24 bit, ended up as an 81MB PDF file when all was said and done. A 32GB MicroSDHC card will fit some 395 albums (!) that way. A 64GB MicroSDXC card will fit 790. The SDXC standard can theoretically reach 2TB … but I ramble.
Here’s a rum ’fing: setting up a rig in which to do comics scanning, even using a high quality camera and tripod, is Not An Easy Thing To Do.
The setup must be mobile — I’m not leaving the rig up with guests around, for example — and so fixed positions that can also be easily replicated is a must. We want to get as close to the motive as possible, yet avoid curved lines at all cost. Less said about focus and shaking the better.
To get the latter right, we start by printing an ISO 12233 chart — or a reasonable facsimile. That’s one of these:
Such a chart is used to measure the resolution of digital cameras. The one I use is a no–cost version of the somewhat expensive ISO original.
While designed for resolution, various problems with fish eye– and barrel/pincushion distortions will also show up — and did. To cut a long story short: in the end I settled for the EF–S 55–250 lens with a focal length of 65mm, exposure of 1/3rd, aperture of f/13, and ISO 100. No, absolutely NO, flash16.
To avoid shaking — long exposure, remember — and make it possible to actually adjust the rig without standing on a chair to see the viewfinder — I’ve used the DSLR Controller application for Android, which gives me remote view–and–control capability.
The resulting image is fairly good, and what is left of optical misfortune can be adjusted.
A final equipment list looks thus:
- A camera. The better the better :)
- A capable lens, preferably with zoom
- A tripod. Must be able to hold the camera horizontally
- Two steel bars, maximum 5 mm wide, to hold the page down
- Four clamps to hold the bars down
- A spirit level to adjust the camera
- One or more lamps mounted below the camera to eliminate shadows (from the bars)
- A partridge in a pear tree is recommended. Cats, not so much.
The capture process create Canon RAW files at 24MB a pop. A 60+ page album would, then, result in a whooping 1.4GB of pages where each take far–too–much–time to load. Like, no.
Manipulating the images using GUI tools is in direct violation of Rule Number 5 — a graphical interface is without exception slow, slow, slow, slow. Did I mention slow? Luckily we have better stuff available which will allow the actual camera–to–album handling to be automated.
Pop the memory card from the camera, mount it as a disk, navigate to the directory, and take it from there. From here on down the descriptions presume Linux or, alternatively, another Unix–based OS.
First we convert the images; a fairly easy process, but one which require the dcraw software. Once installed — and make sure to get 9.16 or better — I wrap it and everything below in tcsh17:
dcraw -j -t 0 -T -w -W -q 3 $file
- –j (ensure 1:1 pixel mapping)
- –t 0 (do NOT rotate)
- –T (write TIFF with meta–data)
- –w (use camera white–balance)
- –W (don’t brighten the image)
- –q 3 (use AHD interpolation; quality == 3)
The above will produce ~52MB uncompressed TIFF images. So far we’re heading the wrong way! :) For most of the rest of the work I’ll use ImageMagic.
In order to keep the page absolutely flat, I use «pressure bars» — two meter–long 5mm by 5mm steel bars which are clamped to the surface. These are slim enough to fit between the spine and the actual drawings, and broad enough to make sure the paper isn’t damaged. When buying, wipe down as they may have machine oil on. Then dry, ’cause rust isn’t a good idea either.
The bars show in the picture; see above, at the bottom. So, to crop them out I use:
convert -crop 4600x3263+240+49 $file $file:r_cropped.tiff
which leave me with 4600x3263 pixels, cropped out from the top left corner 240 pixels in, 49 pixels down.
Rotated for illustration
Quite clearly this image is too dark to be of much use. Luckily there exist a collection of very nice scripts by Fred Weinhaus using ImageMagick for various purposes — among them ‹autowhite› :
autowhite -p 15 $file $file:r_white.tiff
where «–p 15» is a percentage value (pixels closest in colour to white) used to compute the average grey level per RGB channel. Adjust to fit.
A clear improvement? Yes. Still, the colours do tend to get washed out. Fred to the rescue once more:
autocolor -m gamma -c together $file $file:r_colour.tiff
The resulting TIFF is 3200 pixels wide, and impossible for any current devices to show without downscaling. Nicely future–proof, yes, but large; very large MB–wise. Since I always store the original RAW files, a rescale is in order.
So, while absolutely everything Apple do rub me every wrong way there possibly is, I’ll rescale to the maximum possible width their 3rd gen. iPad can handle in portrait mode:
set page = 0 set count = 1 foreach file ( *_cropped_white_colour.tiff ) convert -filter Lanczos -resize 1536x pg_$count.tiff if ( `expr $page % 2` == 0 ) then convert -filter Lanczos -resize $scale -rotate 270 $file:r_colour.tiff pg_$page.tiff else convert -filter Lanczos -resize $scale -rotate 90 $file:r_colour.tiff pg_$page.tiff endif set count = `expr $count + 1` set page = `expr $page + 1` end
- «–filter Lanczos» pick the Lanczos filter [sic]
- «–resize 1536x» resize to 1536 width, but keep the aspect ratio by adjusting height
- «–rotate N» is either 270 or 90 degrees, depending on whether it is a left– or right–hand page
Wouldn’t it be better, you might wonder, to rescale based on the maximum width possible on the target device?
I wondered the same, but tests showed peculiar results. Not only will a 768 pixel wide comic on the iPad 2 show smaller than 768, but the rendering is better with iPad 3 widths.
The latter is true for the Samsung Note 5.3 as well. It renders 800 pixel wide, but the more information going in (the larger the file), the more detail it can show.
Clearly I’m missing something, but … onwards. Since the CR2 files are kept, I can always reset and re–run the assembly script.
We are now left with a number of ~10MB TIFF files — and the gritty is about to get hairy.
In order to produce a single–file comic suitable for reading on a variety of platforms I need to pick a format. The content is non–reflowable; PDF is a good choice for what is effectively an image container. EPUB might have been a choice, but requires more overhead in terms of encapsulation.
The images are bitmaps, and PDF store these as «dictionaries with an associated stream»; a structure of meta–data and a chunk of image 0s and 1s.
No surprise, then, that a 10 MB TIFF embedded in a PDF–file yields a 10 MB PDF. Here’s where the voodoo comes in. ImageMagick to the rescue:
convert pg_*.tiff -compress JPEG Title.pdf
where «–compress JPEG» does NOT select JPEG images or JPEG compression, but rather the lossy DCTDecode18 filter. It is important to note the word «lossy» here, people: the reason why the entire set of operations are performed on TIFF–files is because it is non–lossy. Only now, at the end, do you under … do we discard information.
A 60 page album will end up as approximately 60MB PDF. Not a problem reading that on a modern tablet device.
The finished script included at the end of this article.
10 Wallet Friendly Home Book Scanners — how to digitize your book and mag collection without breaking the bank
Digitization of Books, Manuscripts and Other Documents
Digitization of Manuscripts — A Technical Point of View
Preservation Guidelines for Digitizing Library Materials
Library of Congress, —
Lino Manfrotto + Co., S.p.A., 2010
On de–curvature methods
Scanning Tips For Photos and Paper
Sven Neuhaus 2012
Correcting Lens Distortion in Digital Photographs
Wolfgang Hugemann, 2010
ImageMagic v6 Examples, 10th of January 2011
ImageMagic v6 Examples, 14th of January 2009
Fred’s ImageMagic Scripts
Fred Weinhaus, 2008–2012
ISO 12233 Test Chart
Stephen H. Westin, 21st of April 2010
Decoding raw digital photos in Linux
Readers ‹cook› manga, open up world of possibilities
Kanta Ishida, August 2010
1 If you don’t know the «Girl Genius» series already, be properly ashamed and go read up. This is about as Steampunk as you get without adding a clockwork Holmes and a coal–powered Watson.
2 From overreading. Yes. I know. Ideally one should treat one’s comics like fine Dresden porcelain. Get real :)
3 Support your local and un–local artists, guys. Don’t pirate, or we’ll have to discuss the issue over a knuckle sandwich with a side order of Odalscampari. Look it up.
4 Running madly down corridors, yes, what did YOU think they were doing?
5 This is called «jisui» — 自炊 — in Japan, meaning, roughly, «to cook for oneself». No, it isn’t done in the kitchen. Yes, it may be legal depending on your jurisdiction.
6 Donald Duck & Co., 1977, yes, the Norwegian translations by that very weird language–teacher. Reads like a dictionary.
7 I was 9 in ’77; give the kid a break!
8 For «tossed» read «very gently, very carefully, with infinite precision and tenderness, placed». Duh.
9 http://www.4digitalbooks.com/ — Stanford swears by them :)
10 http://diybookscanner.org/forum/viewtopic.php?f=1&t=1278 – HELL no.
11 A typical Spirou album has 61 pages. At 8 minutes a pop, that’s 488 minutes, or roughly 8 hours. A Donald Duck comic has 33 pages, that is 4.4 hours … but I have about 10 years of the latter, at 52 albums per year, say 520 comics. An estimated 93 days of work, provided I do absolutely nothing else. In other words: «no» :)
12 When he objects, don’t pay any attention. He’s far–more–competent–than–I :) Thanks, joern *hugs*
13 Yes, yes. A table.
14 Canon 550D borrowed from the SO. Thanks, J :) *kisses*
15 Please don’t give me grief on the topic of resolution. Just don’t. I’ll classify you an idiot if you for a moment imagine I have gone through all of this and NOT been aware of the issues. I refer you to http://www.rideau-info.com/photos/mythdpi.html – tho written by ANOTHER idiot. Consider what happens if you photograph a piece of paper of size X by Y. Exactly.
16 Sure, I could use a Canon MR–14EX Ring Flash. Got 500 quid to spare, mate? Point is: with the pressure bars in place, not even a ring flash would necessarily give the appropriate amount of light in the appropriate place.
17 «Why not bash⁈» I hear you cry. Here is how to remove the file extension in tcsh:
Here is how to do it in bash:
Any more questions? :)
18 Not actually a JPEG encoding, but based on the JPEG standard. Important difference? Likely not.
What follows is the completed script, assembled, with most comments removed. Feel free to copy. (Yes, I am aware there is preciously little error handling in there)
#! /bin/tcsh # ----------------------------------------------------------------------------- # # - ccollate - assemble a PDF of pages from a photographed comic album - # # - - # # - by Tina Holmboe <email@example.com>, (ca) 2012 - # # ----------------------------------------------------------------------------- # set version = "1.0a0" set gnice = "/usr/bin/nice -n 9" set raw_postfix = "CR2" set crop = "4600x3263+240+49" set scale = "1536x" set title = $1 if ( $title == "" ) then echo Missing argument title exit endif echo Creating $title.pdf set page = 0 foreach file ( *.$raw_postfix ) printf " $file ... " $gnice dcraw -j -t 0 -T -w -W -q 3 $file printf " converted," $gnice convert -crop $crop $file:r.tiff $file:r_cropped.tiff rm $file:r.tiff printf " cropped," $gnice autowhite -p 15 $file:r_cropped.tiff $file:r_white.tiff rm $file:r_cropped.tiff printf " whitebalance adjusted," $gnice autocolor -m gamma -c together $file:r_white.tiff $file:r_colour.tiff rm $file:r_white.tiff printf " colour adjusted," if ( `expr $page % 2` == 0 ) then $gnice convert -filter Lanczos -resize $scale -rotate 270 $file:r_colour.tiff pg_$page.tiff else $gnice convert -filter Lanczos -resize $scale -rotate 90 $file:r_colour.tiff pg_$page.tiff endif printf " rotated and rescaled." rm $file:r_colour.tiff set page = `expr $page + 1` printf "\n" end printf " assembling PDF\n" $gnice convert pg_*.tiff -compress JPEG $title.pdf printf "\n\ndone\n"