Has any consideration been given to using tarballs or ZIP
files to distribute the data? It seems to me this would be much faster than downloading
many individual XML files and would involve a LOT less server load.
Here are some stats from downloading files from citypage_weather:
The xml/ directory
is about 29 MB according to “du –s” on my
local copy
Running “wget --no-parent --mirror http://dd.weatheroffice.ec.gc.ca/citypage_weather/xml/“ takes
about 20 minutes for me to download 23 MB of updates to this folder.
My average download speed was 112 kB/s, meaning it took
about 3.5 minutes to actually download all the XML files. The other 16.5
minutes was spent sending HTTP requests and waiting for responses.
Tar/gzipping the xml/ directory
takes less than a second and the resulting tarball was 1.7 MB in size. Downloading
this file would take 1 request (instead of 1781 as reported by wget) and take 15-20
seconds to download at 112 kB/s.
I think it would make a lot of sense to at least distribute all
the XML data this way, since there’s very little extra space needed to store a
duplicate compressed copy. It would reduce the server load by a lot reduce
bandwidth usage a bit.
--
Thanks for your time,
Ryan Flegel, B.Sc.
Developer / Développeur
Farm Credit Canada / Financement agricole Canada
1800, rue Hamilton, C.P. 4320
Tel/Tél. : 306-780-7874 Fax/Télec. : 306-780-5655
E-mail/Courriel : ryan.flegel@fcc-fac.ca
Advancing the business of agriculture. Pour l’avenir de
l’agroindustrie.
Please consider
the environment before printing this e-mail. Pensons à l’environnement avant
d'imprimer ce courriel.