I noticed that as of today I cannot grab data from the Dutch tvgids.nl website. I'm using TVxb, which is one of the -friendly- tools (it tries to keep the traffic on the website in mind).
TVxb is using wget (a port from Linux to download web pages). I noticed that apperantly tvgids.nl is not happy with browsers identifying as wget... as soon as I change the agent-name I can download pages again!
example:
C:\Program Files\TVxb\bin>wget http://www.tvgids.nl -t 1
--21:15:34-- http://www.tvgids.nl/
=> `index.html'
Resolving http://www.tvgids.nl... 195.144.10.130
Connecting to http://www.tvgids.nl[195.144.10.130]:80... connected.
HTTP request sent, awaiting response...
Read error (No such file or directory) in headers.
Giving up.
C:\Program Files\TVxb\bin>wget http://www.tvgids.nl -t 1 -U --user-agent="Internet Explorer"
--21:15:40-- http://www.tvgids.nl/
=> `index.html'
Resolving http://www.tvgids.nl... 195.144.10.130
Connecting to http://www.tvgids.nl[195.144.10.130]:80... connected.
HTTP request sent, awaiting response... 200 OK
Length: unspecified [text/html]
[ <=> ] 28,342 --.--K/s
21:15:40 (294.44 KB/s) - `index.html' saved [28342]
As you can see I'm impersonating "internet explorer" in the second download attempt, and yes: I'm allowed!
Unfortunately I can only do this manually, and don't know how to 'fix' TVxb.... Any suggestions?
TVxb is using wget (a port from Linux to download web pages). I noticed that apperantly tvgids.nl is not happy with browsers identifying as wget... as soon as I change the agent-name I can download pages again!
example:
C:\Program Files\TVxb\bin>wget http://www.tvgids.nl -t 1
--21:15:34-- http://www.tvgids.nl/
=> `index.html'
Resolving http://www.tvgids.nl... 195.144.10.130
Connecting to http://www.tvgids.nl[195.144.10.130]:80... connected.
HTTP request sent, awaiting response...
Read error (No such file or directory) in headers.
Giving up.
C:\Program Files\TVxb\bin>wget http://www.tvgids.nl -t 1 -U --user-agent="Internet Explorer"
--21:15:40-- http://www.tvgids.nl/
=> `index.html'
Resolving http://www.tvgids.nl... 195.144.10.130
Connecting to http://www.tvgids.nl[195.144.10.130]:80... connected.
HTTP request sent, awaiting response... 200 OK
Length: unspecified [text/html]
[ <=> ] 28,342 --.--K/s
21:15:40 (294.44 KB/s) - `index.html' saved [28342]
As you can see I'm impersonating "internet explorer" in the second download attempt, and yes: I'm allowed!
Unfortunately I can only do this manually, and don't know how to 'fix' TVxb.... Any suggestions?