Questions about Scraping from thetvdb and themoviedb

craigrs84 · 2014-08-19, 05:30 PM

I'm trying to create my own scraper that reads from the scheduled_recordings table and then tries to map each record to an entry from thetvdb.com or themoviedb.org

I've got a few questions that I'm hoping some people can weigh in on with advice or past experience.

1. Both thetvdb and themoviedb say you shouldn't call their services too frequently. What's the best strategy for caching the data and then periodically updating it? Is it just a matter of downloading the data and remembering the date that you downloaded it, and then the next time the data is requested you compare the downloaded date vs the current date, and if a certain amount of time has elapsed then re-download it?

thetvdb also has a url that you can call, pass in a timestamp, and it will give you back a list of items that have changed since the last time you called the url, but I'm wondering if this actually causes more downloading activity than the strategy discussed above. also i didn't see a similar option for themoviedb and if possible i'd like the code/strategy to be consistent between the two.

2. Specifically for thetvdb, what's the best strategy for matching the series. for tvdb you actually have to make multiple calls to their service, once to search for the series, and then potentially multiple times to download each series that is returned in the search results. is it best to just download all the series first and then scan them all for the best matching episode. or do you always try to narrow it down to a specific series first and only download that specific series rather than the full list returned in the search results. if the 2nd option is taken, is there a reliable way to match the series... many times the series name doesn't match up exactly with the title recorded by npvr/guide data. there's a zap2it id that can be more reliable but it's not always populated in tvdb. there's a first aired date but I'm not sure how reliable that is... etc.

Here's an example of thetvdb series search: http://thetvdb.com/api/GetSeries.php?seriesname=Reign

Thanks for your help.

Possibly Related Threads…
Thread		Author	Replies	Views	Last Post
	Couple of questions about hacking npvr.db3	drmargarit	6	4,366	2014-09-08, 02:22 AM Last Post: sub
	Random API questions	whurlston	7	3,367	2012-08-09, 02:21 PM Last Post: whurlston
	NPVR database questions	mvallevand	25	10,022	2011-01-06, 12:58 AM Last Post: jksmurf
	Positioning of elements in skins (...and some other questions)	ShiningDragon	13	4,524	2010-07-22, 06:42 PM Last Post: ShiningDragon
	Plugin Questions	systemshark	2	1,759	2009-02-28, 08:01 AM Last Post: systemshark
	Plugins and interface questions from a C# rookie	mvallevand	11	4,267	2008-08-26, 03:20 PM Last Post: McBainUK
	IRecordingSource Questions	blackpaw	6	2,611	2008-04-01, 04:50 AM Last Post: sub
	XMLTV Questions	-Oz-	34	10,691	2008-03-24, 01:19 AM Last Post: zehd
	I-xmltv for SchedulesDirect questions	FlatEarth	0	1,170	2007-09-17, 04:44 PM Last Post: FlatEarth
	Questions about getPreviewGraph()	whurlston	3	1,822	2006-12-28, 05:36 AM Last Post: sub