I’ve been trying to shift much more of the paperwork in my life into the digital world, but I was very keen that filing a bit of paper electronically should be as easy as putting it in a folder in the filing cabinet. “Wouldn’t it be nice”, I thought, “if the only thing I had to do was type a name or a few keywords and everything else happened automatically?”
So I built a system which did just that. This video describes in some detail how the script is set up. You may want to use the full-screen and HD options to make things more readable. If you’re less interested in the details and would just like to see it in action, watch the first couple of minutes and then skip to about 13:30.
One thing I don’t talk about in the video is the fact that Hazel rules can also look at the contents of the file. So, once the document has been OCRed, the automatic filing can happen based on words that actually occur on the paper — it might detect your car’s registration number (licence plate), for example, in a document and know to file that under ‘car stuff’ — which I think is very cool.
Some further links:
- David Sparks’ Paperless ebook
- Hazel, and some more things you can do with it.
- PDFpen, from Smile Software.
- Fujitsu Scansnap scanners
Very cool and useful. The only thing that irks me a little is the need to rename files to fit specific syntactic rules (I suck at this and would soon forget them). Do you think it would work if the rules were setup to associate keywords in the document with filesystem namespace structure? (so, for example, a document with the keywords HMRC, TAX, filing is automatically filed in the appropriate place).
Hi Rip –
Yes, I sometimes put some flexibility in my rules so they’ll match any of ‘bank’, ‘lloyds’, ‘lloydstsb’, for example.
But the OCR is also a good way to do it, especially for things which are likely to occur more than once in the document, so if it fails to get recognised in the logo at the top it’ll still be found in the ‘registered office’ details at the bottom, for example.
Great video! I have a Fujitsu scan snap; scan to a gdrive folder, which Evernote watches and pushes to a notebook called inbox. I then manually rename and tag. I like the additional automation you’ve got in there
“Scandrop” will scan directly to Evernote. You can submit multiple scans into a single note, and break up multiple page scans into multiple notes. It allows you to tag and post direct to any notebook. I like it a lot. All I need now is an Applescript to open my mail, put the envelope in the stove (to light the evening fire), place the contents in my scanner and start Scandrop.
Really good and helpful video, some of which I have incorporated into my paperless workflow.
Question: I tend to scan lots of documents at a time, letting the stuff build up in a “real” folder on my desk for a week or two and then scanning everything into a scan inbox all at one time. I really like your idea of using 3 folders, rather than having the scanned document go directly to the Action folder. I had been OCRing into the Action folder with the software that came with my Fujitsu Scan Snap, but was finding that the recognition wasn’t very good. It improved tremendously once I switched to using PFDPen. And automating it as you did makes the process painless.
But here’s the issue I don’t know how to handle: when I scan several documents, one right after the other, into the scan inbox, things start to get out of control. There are boxes popping up all over the screen, asking me to rename all the files I’ve just scanned. I can see each document open in Preview but I can’t tell from the renaming boxes which file I’m actually renaming! When I scan 1 at a time, rename it before scanning the next document, all works great. Got any ideas about that? Do you scan 1 at a time and rename it before scanning the next one?
Any ideas you can throw my way would be much appreciated.
Hi Robin –
Mmm… yes – that’s a challenge – I don’t think there’s a way to tell Hazel not to process the next file until it’s completely finished with the current one. I suspect you’d have to do some more coding for that – detect when the folder has been changed and include loops in the script which goes through each file in the folder one at a time. Or you could move items from the inbox folder to a ‘renaming’ folder, but only do that when the renaming folder is empty. Or, you could use the renaming features of Preview itself, but this is made more complicated by the fact it’s not scriptable. You might be able to do something like bind a keystroke to File > Rename and then use Applescript to send that keystroke to Preview…
I found a similar problem shortly after making the video: it was that Hazel couldn’t always tell when my (somewhat elderly) ScanSnap software had finished creating the file. I suspect this is a problem on multi-page documents – the scanning software probably (foolishly) opens and closes the file repeatedly. So my script would prompt me to rename an incomplete file, and then all sorts of confusion could follow. (All my tests were with single pages which worked fine!)
One solution is to modify the Hazel rules so that they only kick in when the last-modified time of the file is more than a minute ago, but that hardly makes for an interactive experience. In the end, my solution was not to use Hazel for the very first stage. Instead, I get the scanning software to call my renaming script directly, when it knows it has finished scanning. The script puts the resulting file into the ‘OCR before Action’ folder, and Hazel takes it from there. All works happily now.
Thanks for the response.
Not being very well versed in coding (like knowing nothing!) I think I’ve come up with a solution that works for me. It’s not very pretty, and it does add an additional step, but it fixes the problem. I scan everything in at once (so at least I don’t have to sit at the scanner for very long) but not into the Scan Inbox folder. Into a folder I’ve called “Holding”. Once everything is scanned, I move 1 file at a time from Holding into the Scan Inbox and let it do it’s thing. Once the file has been renamed and moved on to PDFPen, I can then move the next file into the Scan Inbox. A bit clumsy but it works.
I also have a fairly old Fujitsu scanner with matching software, but I’ve never encountered the issue you’ve run into, even with lengthy documents than run into lots of pages. Maybe that’s because my scanner and software are Window based (shame on me, I know), left over from before I became a Mac convert. I do almost all of my work on a MB Air, but have an iMac in my office that has bootcamp on it, so rather than buying a new scanner, I just scan to Dropbox via Windows. Works fine. What would I do without Dropbox??
Love reading your blog BTW.
That seems like a pretty good solution to me!
Great video. I have been meaning to do something like this for a while now.
Wondering if you would be ok sharing the softcopies of automator and apple script. Being lazy here! 🙂
I went ahead and created the scripts. Thanks again for sharing your knowledge. Will follow the blog!
Hi Sam –
Yes, I think that’s best – I’d be happy to post the scripts, but you’d probably want to tweak them for your own folders etc anyway.
I’ve made a couple of other tweaks, which I should write up soon…
All the best,
Do share the tweaks that you mention! I am glad I read the comments – I am going to try and launch the first Rename application using the Scansnap software. I also have the older model – looks like yours in the video. 🙂
Thanks again for sharing!
thank you for sharing this one. I am looking forward for your updated workflow, which you have described in one of the last MPU shows. I really like the idea of having a pipeline of steps.
I think the applescripts broke with the new OS.
I wish I knew the scripts enough to troubleshoot.
Just wondering if you are still using this system and whether you updated the scripts.
Hi Sam –
I’ve probably modified the scripts a bit since then… I’ll try to do an update at some point, but yes, the basic system works for me under Yosemite.
There’s also some related stuff in my video clip here:
but I don’t think I went into much detail about the actual scripts…
Do you know where it’s actually breaking for you?
All the best,