2014-08-19, 05:30 PM
I'm trying to create my own scraper that reads from the scheduled_recordings table and then tries to map each record to an entry from thetvdb.com or themoviedb.org
I've got a few questions that I'm hoping some people can weigh in on with advice or past experience.
1. Both thetvdb and themoviedb say you shouldn't call their services too frequently. What's the best strategy for caching the data and then periodically updating it? Is it just a matter of downloading the data and remembering the date that you downloaded it, and then the next time the data is requested you compare the downloaded date vs the current date, and if a certain amount of time has elapsed then re-download it?
thetvdb also has a url that you can call, pass in a timestamp, and it will give you back a list of items that have changed since the last time you called the url, but I'm wondering if this actually causes more downloading activity than the strategy discussed above. also i didn't see a similar option for themoviedb and if possible i'd like the code/strategy to be consistent between the two.
2. Specifically for thetvdb, what's the best strategy for matching the series. for tvdb you actually have to make multiple calls to their service, once to search for the series, and then potentially multiple times to download each series that is returned in the search results. is it best to just download all the series first and then scan them all for the best matching episode. or do you always try to narrow it down to a specific series first and only download that specific series rather than the full list returned in the search results. if the 2nd option is taken, is there a reliable way to match the series... many times the series name doesn't match up exactly with the title recorded by npvr/guide data. there's a zap2it id that can be more reliable but it's not always populated in tvdb. there's a first aired date but I'm not sure how reliable that is... etc.
Here's an example of thetvdb series search: http://thetvdb.com/api/GetSeries.php?seriesname=Reign
Thanks for your help.
I've got a few questions that I'm hoping some people can weigh in on with advice or past experience.
1. Both thetvdb and themoviedb say you shouldn't call their services too frequently. What's the best strategy for caching the data and then periodically updating it? Is it just a matter of downloading the data and remembering the date that you downloaded it, and then the next time the data is requested you compare the downloaded date vs the current date, and if a certain amount of time has elapsed then re-download it?
thetvdb also has a url that you can call, pass in a timestamp, and it will give you back a list of items that have changed since the last time you called the url, but I'm wondering if this actually causes more downloading activity than the strategy discussed above. also i didn't see a similar option for themoviedb and if possible i'd like the code/strategy to be consistent between the two.
2. Specifically for thetvdb, what's the best strategy for matching the series. for tvdb you actually have to make multiple calls to their service, once to search for the series, and then potentially multiple times to download each series that is returned in the search results. is it best to just download all the series first and then scan them all for the best matching episode. or do you always try to narrow it down to a specific series first and only download that specific series rather than the full list returned in the search results. if the 2nd option is taken, is there a reliable way to match the series... many times the series name doesn't match up exactly with the title recorded by npvr/guide data. there's a zap2it id that can be more reliable but it's not always populated in tvdb. there's a first aired date but I'm not sure how reliable that is... etc.
Here's an example of thetvdb series search: http://thetvdb.com/api/GetSeries.php?seriesname=Reign
Thanks for your help.