synching with os.walk()

Andre Meyer

unread,

Nov 24, 2006, 10:57:02 AM11/24/06

to pytho...@python.org

Hi all

os.walk() is a nice generator for performing actions on all files in a directory and subdirectories. However, how can one use os.walk() for walking through two hierarchies at once? I want to synchronise two directories (just backup for now), but cannot see how I can traverse a second one. I do this now with os.listdir() recursively, which works fine, but I am afraid that recursion can become inefficient for large hierarchies.

thanks for your help
André

120...@gmail.com

unread,

Nov 24, 2006, 11:12:08 AM11/24/06

to

I've run into wanting to work with parallel directory structures
before, and what I generally do is something like:

for root, dirs, files in os.walk( dir1 ):
dir2_root = dir2 + root[len(dir1):]
for f in files:
dir1_path = os.path.join( root, f )
dir2_path = os.path.join( dir2_root, f )

Does this work for your needs?
-- Nils

Paddy

unread,

Nov 24, 2006, 11:27:09 AM11/24/06

to

Andre Meyer wrote:

Walk each tree individually gathering file names relative to the head
of the tree and modification data.

compare the two sets of data to generate:
1. A list of what needs to be copied from the original to the copy.
2. A list of what needs to be copied from the copy to the original

Do the copying.

|You might want to show the user what needs to be done and give them
the option of aborting after generating the copy lists.

- Paddy.

Paddy

unread,

Nov 24, 2006, 11:37:13 AM11/24/06

to

Paddy wrote:

P.S. If you are on a Unix type system you can use tar to do the copying
as you can easily compress the data if it needs to go over a sow link,
and tar will take care of creating any needed directories in the
destination if you create new directories as well as new files.
- Paddy.

Thomas Ploch

unread,

Nov 24, 2006, 11:48:05 AM11/24/06

to pytho...@python.org

Wouldn't it be better to implement tree traversing into a class, then
you can traverse two directory trees at once and can do funny things
with it?

Thomas

Andre Meyer

unread,

Nov 24, 2006, 12:03:56 PM11/24/06

to Paddy, pytho...@python.org

That sounds like a good approach.

--
http://mail.python.org/mailman/listinfo/python-list

Paddy

unread,

Nov 24, 2006, 12:09:38 PM11/24/06

to

Paddy wrote:
> P.S. If you are on a Unix type system you can use tar to do the copying
> as you can easily compress the data if it needs to go over a sow link,

Sow links, transfers your data and then may form a tasty sandwich when
cooked.

(The original should, of course, read ...slow...)
- Pad.

Antoine De Groote

unread,

Nov 24, 2006, 4:44:10 PM11/24/06

to

http://aspn.activestate.com/ASPN/Cookbook/Python/Recipe/191017
might be what you are looking for, or at least a starting point...

Regards,
antoine

BartlebyScrivener

unread,

Nov 24, 2006, 10:17:42 PM11/24/06

to

Antoine De Groote wrote:
>
> http://aspn.activestate.com/ASPN/Cookbook/Python/Recipe/191017
> might be what you are looking for, or at least a starting point...
>

There's an updated version of this script at pages 403-04 of the Python
Cookbook 2nd Edition.

rd

peter...@gmail.com

unread,

Nov 27, 2006, 4:46:37 AM11/27/06

to

I wrote a script to perform this function using the dircmp class in the
filecmp module. I did something similar to this:
import filecmp, os, shutil

def backup(d1,d2):
print 'backing up %s to %s' % (d1,d2)
compare = filecmp.dircmp(d1,d2)
for item in compare.left_only:
fullpath = os.path.join(d1, item)
if os.path.isdir(fullpath):
shutil.copytree(fullpath,os.path.join(d2,item))
elif os.path.isfile(fullpath):
shutil.copy2(fullpath,d2)
for item in compare.diff_files:
shutil.copy2(os.path.join(d1,item),d2)
for item in compare.common_dirs:
backup(os.path.join(d1,item),os.path.join(d2,item))

if __name__ == '__main__':
import sys
if len(sys.argv) == 3:
backup(sys.argv[1], sys.argv[2])

My script has some error checking and keeps up to 5 previous versions
of a changed file. I find it very efficient, even with recursion, as it
only actually copies those files that have changed. I sync somewhere
around 5 GB worth of files nightly across the network and I haven't had
any trouble.

Of course, if I just had rsync available, I would use that.

Hope this helps,

Pete