Saturday, January 24, 2009

What I want ... a digital filing cabinet

I have a dream. And so does Mark Dalton. And so do lots of people when you search for this. And hopefully by sharing it, someone will come to my rescue and make the world a better place. 
What is this huge dream .... ? It's a filing cabinet that does everything for you. 
I have too many old bills and letters that I don't want to throw away, but have no space in my filing cabinet. Having decided that keeping them in paper form was too inefficient I decided to scan them but this is so slow I put it off. Let me go through the process so you feel my pain.

Plug in the scanner (I can't leave it plugged in because it seems to interfere with my boot sequence).
Wait for the Lexmark software to notice it.
Pick up a bill (2 sheets).
Open the lid.
Put sheet 1 on the scanner. 
Carefully manually line it up with the glass.
Close the lid.
Go to "scan" in the software.
Choose advanced settings.
Choose "scan as A4".
Choose "more than one page".
Choose OCR.
Click OK.
Choose "send scanned image to file".
Click Scan Now.
In the file dialog choose type PDF (because it forgets this is what I always choose).
Choose my USB drive (because it always defaults to "my documents" on the C drive)
Click through to an appropriate directory.
Choose a file name (which probably involves taking off the scanner so I can check the date and add it to the file name). Put back on scanner, close lid.
Sit and wait while it scans (only 30s, but not much else I can do).
Open lid.
Remove page, and place on the floor for recycling or shredding.
Place page 2 carefully on the glass.
Close lid.
Click scan now.
Wait again. another 30s.
Open lid.
Remove page.
Close lid.
Click "No I don't want any more pages".
Wait for OCR to complete.
Stop typing this blog post while Abbyy FineReader (OCR software) steals my keyboard focus.
Hope I haven't typed anything that would interrupt the OCR process.
Breathe a sigh, and wish I had that 3 minutes of my life back, and decide I'll put off scanning the rest of a pile for another day.

You probably didn't want to read that, but I didn't want to do it. Lots of these steps could be improved in small ways but let me dream the big dream.
Imagine a box the size of a laptop, with something like the lid of a professional photocopier on top, but it only needs to be A4 size. I walk up to it, put the document on the feeder, and hit a big green button. All the sheets feed through the feeder in a second or two, and I either walk away leaving it in the "to be recycled" pile or collect it and shred it.
The machine scans it both sides at maximum resolution and in colour, after all storage is free, and saves it to the hard drive as an incoming image. Maybe from my fingerprint on the button it knows it is me scanning it. Then in the background a number of things happen. After all, this box is idle most of the time. 
It runs the document through OCR. It auto rotates it to the correct orientation, even sorting out pages I fed through upside down or back to front. 
It works out from this if it seems to be more appropriate to be a photo (JPEG) or PDF. If JPEG it adds appropriate tags from the date of scanning, and maybe some image recognition. If a PDF, it merges the OCR text and image together and stores to the hard drive. 
The built in google desktop like functionality in the box indexes it. It also does "more like this" analysis similar to google images and google web. 
It then encrypts it to store it, so if the machine gets broken into I don't lose confidential information, like I would with a filing cabinet. They key is securely stored with my password, but optionally recoverable depending whether I think the risk of forgetting or the risk of the service being compromised is greater. The input queue is then securely wiped of the original.
It connects to a free online digital signing service (through it's wireless network connection to my home network) and signs the document with the date, so I can prove in court I had it when I scanned it.
Most importantly, I don't have to wait.
Later on I come back and want to find my bill. I connect to the machine from my laptop. Did I mention the cabinet connects wirelessly to my home network seamlessly. I go to its web interface (secured appropriately), and look for "gas bill". 
It finds a number and presents them in chronological order, most recent first. 
And also offers "more like these" functionality. However it does it I can always find my document. I can tag it with tags, like I can with my gmail to make sorting easier, but I don't really need to.
If I'm worried about the hard drive, I can synchronize it with a directory on my laptop. Or stick a DVD-R or CD-R into the box's built in drive which automatically burns me a backup copy. Which I can restore from seamlessly if I need to replace the drive, and it merges in, not overwrites.
But most of the time all I do is stick in some sheets and press a button.
Google, Canon, Lexmark, Xerox, Apple - where are you? When can I have one? All of this functionality already exists, it just needs a good designer(are you listening Jonathan Ive?) to stick it all together.

P.S.
This post deserves a confession. I had a previous job working for a Canon research lab in Human Computer Interaction. Why didn't I think of this then?