listing duplicate files

Charles Howse

unread,

Nov 28, 2005, 2:21:49 PM11/28/05

to

I need to find all duplicate filenames in a directory and subdirectories.
I would like to know how to use find to list *only* the last part of a
filename.

$ find . -name '*.jpg'
./iPhoto Library/2002/09/30/823167-R1-00-25.jpg
./iPhoto Library/2002/09/30/823167-R1-01-24.jpg
./iPhoto Library/2002/09/30/823167-R1-02-23.jpg
./iPhoto Library/2002/09/30/823167-R1-03-22.jpg
./iPhoto Library/2002/09/30/823167-R1-04-21.jpg
./iPhoto Library/2002/09/30/823167-R1-05-20.jpg
./iPhoto Library/2002/09/30/823167-R1-06-19.jpg
./iPhoto Library/2002/09/30/823167-R1-07-18.jpg
...

I would prefer to see:

823167-R1-00-25.jpg
823167-R1-01-24.jpg
823167-R1-02-23.jpg
823167-R1-03-22.jpg
823167-R1-04-21.jpg
823167-R1-05-20.jpg
823167-R1-06-19.jpg
823167-R1-07-18.jpg
...

Can't use 'cut', don't know how many fields to cut from the output. Some
may have 5 '/' as above, some may have more or less.
Haven't made any progress using 'sed', might work...?

Any ideas?

Thanks,
Charles

Chris F.A. Johnson

unread,

Nov 28, 2005, 2:47:55 PM11/28/05

to

Use awk:

awk -F '/' '{ print $NF }'

To count the instances of each:

awk -F '/' '{ ++f[$NF] }
END { for (file in f) printf "%5d %s\n", f[file], f }'

--
Chris F.A. Johnson, author | <http://cfaj.freeshell.org>
Shell Scripting Recipes: | My code in this post, if any,
A Problem-Solution Approach | is released under the
2005, Apress | GNU General Public Licence

Michael Tosch

unread,

Nov 28, 2005, 3:05:20 PM11/28/05

to

find . -name '*.jpg' |

awk -F / '{print $NF}'

List duplicate files:

find . -name '*.jpg' |
awk -F/ 's[$NF]++==1{print $NF}'

With a counter:

find . -name '*.jpg' |
awk -F/ '{s[$NF]++}END{for(i in s)if(s[i]>1)print s[i],i}'

--
Michael Tosch @ hp : com

xicheng

unread,

Nov 28, 2005, 4:16:02 PM11/28/05

to

you may try this:

$ find . -name '*.jpg' -print -exec basename {} \;

XC

serrand

unread,

Nov 28, 2005, 6:13:17 PM11/28/05

to

a very slow solution...

for i in `find . -name '*.jpg' -exec basename {} \; ` ; do for j in
`find . -name "$i" -print | wc -l`; do if test "$j" -gt 1; then printf
"%s : %s" "$j" "$i" ; fi; done done

Some more ideas?
It should be nice if we could see filenames of duplicates (directories...)

xavier

Frank Dietrich

unread,

Nov 28, 2005, 5:31:29 PM11/28/05

to

Hi Charles,

Charles Howse wrote:
> I would like to know how to use find to list *only* the last part of a
> filename.
>
> $ find . -name '*.jpg'
> ./iPhoto Library/2002/09/30/823167-R1-00-25.jpg
> ./iPhoto Library/2002/09/30/823167-R1-01-24.jpg

> ...
>
> I would prefer to see:
>
> 823167-R1-00-25.jpg
> 823167-R1-01-24.jpg

find . -name '*.jpg' -printf "%f\n"

Frank

xicheng

unread,

Nov 28, 2005, 7:39:08 PM11/28/05

to

> a very slow solution...
>
> for i in `find . -name '*.jpg' -exec basename {} \; ` ; do for j in
> `find . -name "$i" -print | wc -l`; do if test "$j" -gt 1; then printf
> "%s : %s" "$j" "$i" ; fi; done done
>
> Some more ideas?
> It should be nice if we could see filenames of duplicates (directories...)

To find duplicate file names, why not try:
find . -name '*.jpg' -exec basename {} \; | sort | uniq -d
or find the number of occurrences for each fine:
find . -name '*.jpg' -exec basename {} \; | sort | uniq -c

XC

Ángel

unread,

Dec 7, 2005, 8:15:47 PM12/7/05

to

Charles Howse <m...@privacy.net> wrote:
> I need to find all duplicate filenames in a directory and subdirectories.

How about using fdupes?

--
Saludos,
Ángel

mig...@yahoo.com

unread,

Dec 8, 2005, 6:25:14 PM12/8/05

to

This solution would work on every unix box, as it does not use any
'modern' features or gnu features:

a) this will list all the files in all subdirectories:

find . -name '*.c' -exec basename {} | sort|uniq -c|sort -nr|grep -v '
1'|while read n f; do find . -name $f; done

b) this will show how manu duplicates along with the filename itself:
find . -name '*.c' -exec basename {} | sort|uniq -c|sort -nr|grep -v '
1'

Etienne Marais

unread,

Dec 8, 2005, 7:58:00 PM12/8/05

to

mig...@yahoo.com wrote:

In bash, missing arg. to exec (or am I doing something wrong?)

--
Etienne Marais
Cosmic Link
South Africa

Etienne Marais

unread,

Dec 8, 2005, 8:07:01 PM12/8/05

to

Etienne Marais wrote:

-exec basename {} should be -exec basename {} +

(?)

John L

unread,

Dec 8, 2005, 8:08:21 PM12/8/05

to

"Etienne Marais" <eti...@cosmiclink.co.za> wrote in message news:dnakps$hu$1...@ctb-nnrp2.saix.net...

You need an (escaped) semi-colon:
find . -name '*.c' -exec basename {} \; | whatever

--
John.

Chris F.A. Johnson

unread,

Dec 8, 2005, 8:31:42 PM12/8/05

to

No, it shouldn't; basename takes at most two arguments. It should
be: -exec basename {} \;

Michael Tosch

unread,

Dec 10, 2005, 9:16:46 AM12/10/05

to

Besides the missing \; after the -exec,
aren't the following ones faster *and* run on old systems?

b)
find . -name '*.c' -print |
awk -F/ '{s[$NF]++} END {if(NR)for(i in s)if(s[i]>1)print s[i],i}'

a)
find . -name '*.c' -print > tmpf
awk -F/ '{s[$NF]++} END {if(NR)for(i in s)if(s[i]>1)print i}' tmpf |
while read f; do awk -F/ '$NF=="'$f'"' tmpf; done

Michael Tosch

unread,

Dec 10, 2005, 9:24:34 AM12/10/05

to

Correction: you never know what find finds, so it should better be

a)
find . -name '*.c' -print > tmpf
awk -F/ '{s[$NF]++} END {if(NR)for(i in s)if(s[i]>1)print i}' tmpf |

while read f; do awk -F/ '$NF==F' F="$f" tmpf; done