13 April 2008

#66. Dupe Finders and One-Liners

Welcome back, dear readers. This week your chef had to take his act out on the road again, despite his notable lack of wanderlust. Thus your chef was faced today with the choice of strolling around his hotel’s serene Japanese garden and exploring the surroundings in a new city, or spending many hours evaluating new utility software and writing his recommendations. Maybe this will give you a clue about his choice:

Eliminate Double Vision

Last week’s post led to an interesting discussion in the comments about software for detecting duplicate digital photos on your disk. Such programs try to identify duplicate pictures by examining their sizes or even the actual images. For those of you who missed it, I suggested to reader Jim that he try these programs:

DupDetector (free; compares images by content; apparently orphaned by its developer, and not certified for Vista)

VisiPics (free; compares images by content)

Image Comparer ($35; compares images by content)

Image File DeDuper (also known as JPeg DeDuper; free; compares by file size first, then by content)

Md5sums (free; command-line tool that only compares file checksums)

Reader Mike then commented that JPeg De-Duper is very fast with very few false positives, but misses some duplicates. He recommends DoubleKiller Pro ($20), which also has a free version.

I have not tried any of these programs, though I know I need to use one on my overgrown photo collection. I welcome your further comments on these or any others you know of.

Update on Free Firewalls

Lifehacker recently asked its readers to name the best free software firewall. The results generally confirm my existing conceptions: Approximately 36% chose Comodo Personal Firewall. ZoneAlarm Free garnered about 23% of the votes, closely followed by Windows’ built-in firewall with 22%. Sygate Personal Firewall took 7%, and various others got about 1% of the votes each.

The big surprise for me in the Lifehacker survey was the 9% vote for “Fire-what? Don’t use one.” I doubt that any of those 9% are Tool Bar readers too. But if you are among them, I’m telling you now: Get a firewall!

Even the simple built-in Windows firewall helps prevent hackers and malware from breaking in to your computer (and it might be working without your even being aware of it). More sophisticated firewalls, such as Comodo (which I use – see my remarks in posts #6, #47, #51, and #57), also stop malware that somehow gets into your computer from reaching out to the Internet and doing more harm. Used in conjunction with good antivirus, antispyware scanning, and host intrusion protection programs (the latter also included with the Comodo firewall), you can rest assured that your computer is reasonably well protected.

I continue to recommend Comodo for its excellent performance in testing and its generally good interface. And Comodo continues to reward me with safety, but also to punish me with some very irritating habits – particularly the way it lays its messages right on top of each other so you can’t read or click on them. And yesterday for unknown reasons, Comodo apparently forgot many of the programs I trained it to recognize and accept, resulting in a blizzard of new pop-up questions that really tried my patience. (Come on, Comodo, you don’t recognize Windows Media Player any more?) So watch this space in coming weeks for my assessments of other firewalls.

And now let’s see what insights Linux grandmaster Mark Lautman has for us today….

Did You Hear the One About.…

by Mark Lautman

A computer analyst said to a programmer, “You start coding. I'll go find out what the customer wants.”

“I haven't lost my mind; it's backed up on tape somewhere.”

This form of humor is called the “one-liner.” I got these examples from the fabulous collection at http://www.oneliners-and-proverbs.com/.

One-liners are great in Linux, too. For several posts I've been describing the “command line” and the “terminal window,” but I haven't exactly said what you can do with those things. For the next few weeks, I'll introduce some one-line commands that show what the terminal window can do for you.

If you maintain a Web site, you've probably come across the situation where you need to change one little thing in 100 HTML files. I've done my share of changes to relative directories, or even just replacing one word with another. One way of doing this is to open each cute little HTML file in a text editor, and do a find and replace. This works fine for the first 10 cute little HTML files, but after that they don't look so cute or little any more. You could write a Word macro, which cuts down the time quite a bit, but you'll need at least two lines to open and close the files.

In Linux you can change all 100 files with a single command. For example, the following command replaces all instances of “Tool” to “Bar” in all HTML files in a directory:

perl -p -e 's/Tool/Bar/ig' *.html


I took the above example from Rice University’s Edit Your HTML Files with a One-Line Perl Program. You can find variations on this theme at that site. If there were a contest for the most valuable one-line command, this is a sure winner.

My son tells me that real mammalians have hair. I tell him that real HTML files start with some type of a document declaration. Nobody does this, certainly not the big retail sites, but it’s a good practice. Below is an example of adding a document type to the first line of all HTML files in a directory. (This example is based of a collection of one-liners at Perl One Liners.

perl -i -ple 'print q{} if $. == 1; close ARGV if eof' *.html

Can you find the error in in this sentence? A posting at UNIX for Dummies Questions & Answers has a few single-line commands to find lines containing duplicate words, for example:

perl -ne 'print "$.: doubled $_\n" if /\b(\w+)\b\s+\b\1\b/'

The previous examples used Perl commands. Perl is my favorite language for abusing text files. There are other Linux utilities as well. Linux’s sed (Stream EDitor) is very popular. For example, have you ever received a file with annoying empty lines between each paragraph? If you have sed, you can eliminate all those lines with an amazingly short command (suggested by Eric Pement):

sed 'n;d' filename.txt

If you haven't received annoying empty lines, you've probably received files with annoying leading spaces and tabs. The following sed command, offered by sed one-liners, has a solution just for you:

sed 's/^[ \t]*//' filename.txt

[You’ll find a good introductory sed reference guide here. —JP]

Consultants, as we all know, get paid by the amount of work they do, not the quality. If you want to compare the number of words your consultants are giving you in text files, use the wc (word count) command:

wc -w `find . -name "*.txt"`

Below is a list of word counts from the Tool Bar's recent posts.

Next week we'll look at some Perl modules that are available for special tasks.

TOOL BAR AND GRILL FREE OFFER (for one week only): If you find yourself doing a repetitive task on text files, send me a description and I'll try to automate it using Linux commands. Contact me through the Tool Bar at jonathanstoolbar@gmail.com. —Mark Lautman

That wraps up another great Tool Bar. Do come back next week and every week for more great tips and software recommendations, and don’t forget to bring all your friends. And please help keep this blog going by visiting our advertisers.

Share your thoughts, and take advantage of Mark’s free offer, by clicking on “comments” below or writing to jonathanstoolbar@gmail.com.

11 comments:

  1. Hi Jonathan:

    To replace a word or phrase in many HTML files, I still do it the old fashioned way by loading up all of the relevant files simultaneously in NoteTab [Light or Pro, doesn't matter] and executing a global search & replace. However, just a few minutes ago, I stumbled across "Quick Search & Replace" a freeware utility from
    http://www.searchreplacetext.com/quick_search_replace.html
    .
    I haven't tested it yet but it does look intriguing...
    ________
    FinibusBonorum@yahoo.com

    ReplyDelete
  2. Hello. This post is likeable, and your blog is very interesting, congratulations :-). I will add in my blogroll =). If possible gives a last there on my blog, it is about the TV de LCD, I hope you enjoy. The address is http://tv-lcd.blogspot.com. A hug.

    ReplyDelete
  3. Dear Finibus: Thank you for writing in again, and for suggesting Quick Search & Replace. I have heard of it, but have not yet tried it either. Let's give it a whirl!

    ReplyDelete
  4. Dear TV de LCD: Thank you, too, for taking the time to write in. I am grateful for your compliments, and I hope my blog is helpful to you. Regrettably, I wish I could read your blog, but I don't know Portugese.

    ReplyDelete
  5. In Linux you can change all 100 files with a single command.

    Using perl, you can change all 100 files with a single command.

    Perl is just an application, available for a variety of operating systems.

    ReplyDelete
  6. John,

    In regards to your firewall and malware portion of this edition of Tool Bar & Grill, here is a great site i know of that has a sticky thread on how to protect yourself
    using firealls, and real time blocking tools and virus protection.

    http://forums.majorgeeks.com/showthread.php?t=44525

    They also have super helpful anti-malware member who have helped me disinfect my newphews computer and my mom's computer using their forums.

    They suggest and list all the tools you need to stay protected, with 90% of them being free.

    Check them out. Here is the link to the main Mslware Forum Page.

    http://forums.majorgeeks.com/forumdisplay.php?f=35

    ReplyDelete
  7. Pauliwood, thank you indeed for your helpful comment. I just checked out the MajorGeeks forum youi recommended, and indeed it contains much useful info. However, it also appears to be somewhat out of date by now. I have covered some of the software mentioned there in previous blog posts.

    ReplyDelete
  8. Hey john, thanks for the reply. Not sure what you mean by out of date, as they update their recommended software as it changes. They use to use Hijackthis, and now that it is owned by another group, they have since updated their prevention and removal guides.

    They always recommend using the update feature of the software they recommend.

    Their strength is their help forums which is very good at assisting pc users, from beginner to advanced intermediate in removing spyware/malware, etc.

    Anyhow, good job on your blog, keep up the good work!!

    ReplyDelete
  9. VisiPics is a stable, unbloated program that simply does its job - finding duplicate images, including those edited and saved as different file types and/or sizes - very, very well. Without even changing the default settings, I was able to clear out over 1 GB of duplicate images with a minimum of hassle. I, of course, expressed my appreciation to its developer in a tangible way (i.e. with a cash donation) as well as with written feedback and suggestions. The developer responded with a personal thanks for the feedback and donation and added my requested image formats to the Road Map for future development. I highly recommend VisiPics and intend to make another pass through my image collection once my requested RAW formats are supported.

    ReplyDelete
  10. Diana, thank you very much for sharing your endorsement of VisiPics. I'll try it out for my own use, too. And I applaud you for showing your appreciation to the developer with a donation. I encourage all readers to do the same for any free software you like and use.

    ReplyDelete
  11. Hi Jonathan... I knew I would find what I wanted here... well, almost everything.. love the picture dup finder.. I used a horrible little program that took ALL of my music and RENAMED them!!! And organized them into stupid folders.. yuck, what a mess it made! Luckily, in Windows 7, I just restored the music folder and voila!

    Now, what I am looking for (and maybe you can help) is along the lines of dupes...

    1. Outlook Contact dup finder
    2. Music dup finder
    3. "All files" finder.. to find progams, pdfs, anything on the computer that is lurking in more than one folder...

    Any ideas?

    Thanks,
    Amnon

    ReplyDelete