Has any consideration been given to using tarballs or ZIP files to distribute the data? It seems to me this would be much faster than downloading many individual XML files and would involve a LOT less server load.

 

 

Here are some stats from downloading files from citypage_weather:

 

The xml/ directory is about 29 MB according to “du –s” on my local copy

 

Running “wget --no-parent --mirror http://dd.weatheroffice.ec.gc.ca/citypage_weather/xml/“ takes about 20 minutes for me to download 23 MB of updates to this folder.

 

My average download speed was 112 kB/s, meaning it took about 3.5 minutes to actually download all the XML files. The other 16.5 minutes was spent sending HTTP requests and waiting for responses.

 

Tar/gzipping the xml/ directory takes less than a second and the resulting tarball was 1.7 MB in size. Downloading this file would take 1 request (instead of 1781 as reported by wget) and take 15-20 seconds to download at 112 kB/s.

 

 

I think it would make a lot of sense to at least distribute all the XML data this way, since there’s very little extra space needed to store a duplicate compressed copy. It would reduce the server load by a lot reduce bandwidth usage a bit.

 

--

Thanks for your time,

Ryan Flegel, B.Sc.
Developer / Développeur
Farm Credit Canada / Financement agricole Canada

1800 Hamilton Street, P.O. Box 4320
1800, rue Hamilton, C.P. 4320
Regina SK S4P 4L3
Tel/Tél. : 306-780-7874  Fax/Télec. : 306-780-5655
E-mail/Courriel : ryan.flegel@fcc-fac.ca

Advancing the business of agriculture.
Pour l’avenir de l’agroindustrie.

Please consider the environment before printing this e-mail. Pensons à l’environnement avant d'imprimer ce courriel.