NextPVR Forums

Full Version: Automatic movie recording based on existing libraries
You're currently viewing a stripped down version of our content. View the full version with proper formatting.
Pages: 1 2
I recently completed some of the final steps in my movie library scraper, and I wanted to share how I do it with the NPVR users.

I've been collecting movies from OTA sources for years - started as a hobby back when the Hauppage SCART adapters could be used as a NPVR front end on the older v3 version. Then some software came around to collate the libraries (thanks, team Kodi) as well as metadata scrapers (hats off to TinyMM) to complete everything.

Some people collect stamps, other butterflies, I collect movies - around 3,300 of them so far.

The main problem before was threefold

1) How to avoid recording what I already have in HD (typically 1080p or 720p)
2) How to re-record things which may have dropouts or other glitches
3) How to re-record other shows where I may have a low bitrate or lower resolution and want to get a better version

I've fixed 2 and 3 a while ago with some scripting, json metadata files, and heavy post processing. The update to v5 and a rewritten scheduler with the ability to add "custom" searches (effectively SQL scripts) really made new things possible.  Caveat - this is not for the faint of heart, and the scripts actively delete files going into postprocessing, as well as actively avoid recording movies on a "do not record" list.

There are still some bugs to iron out (handling file and directory names with ! in them) however it's around 95% there for me already.

Directory structures:
Recordings
_LOGS - logfiles, plus location of the DNR CSV file uploaded into the NPVR database
DO_NOT_RECORD - just empty directories with "cleaned" names of things to delete after or before recording
WATCH_SD - directories with the associated JSON files for SD movies already recorded (used by PostProcessing)
PREP_HD - directories with SD movies but with a very high bitrate (used by SQL scripts)
HD_REDO - movie names as directories where the original HD is a little more defective than usual or missing a minute beg/end (used by PostProcessing, but need to work into the SQL scripting, as these 5 are now being "avoided" by the scheduler
WATCH_SD_SPECIAL - same as HD_REDO but for SD materials where movie names can vary

Do Not Record:
Movies land here by running the DeleteMovie.cmd file against it - I've integrated this in the W10 shell commands for directories to make life easy. The moviename is cleaned before inserting a new entry into DO_NOT_RECORD directory

Watch SD Special:
Similar to above, this puts an entry into the WATCH_SD directory using the SD_Special_Delete.cmd file.

Note that Recordings is mapped as "R:" on my remote machine and is natively "H:" on the HTPC.

Database changes:
We need to add a new table called "DO_NOT_RECORD" which is updated daily as part of the PostUpdateEPG scripting. This is used in a SQL custom recording search which tries to avoid recording materials which are already available. Due to the structure of the EPG_EVENT table, the concatenation of (Moviename - Year) is not available and I don't want to blow up Sub's work by messing with existing tables. The DNR table uses a "cleaned" name where offensive combinations of dash, double space, ampersand period and other punctuation is removed. I also have to forcibly remove the movie year from the cleaned name, as it excludes recordings from EPG_EVENT going forward, and that field only has the name.

SQL Scripting
A script UpdateDNRDB.bat is run after EPG where the do-not-record material is re-generated by scanning the existing directories - this helps ensure that recently cut/added movies are quickly excluded from the recording lineup. Categories are SD - things to re-record in HD / DHD - delele as HD existing / DNR - do not record. The local directories are scanned

Recording Searches
There are two searches - one for a HD channel group (where I can excluded the DNR and DHD movies) and one for a SD channel group (where I can exclude the previous, plus a number of SD movies, currently around 500, where I have such a high bit rate it's not going to get better). These searches ensure that I can avoid wasting tuner capacity for stuff I already have or have decided I do not want.

Search #1 for HD Channels
[select * from epg_event where] genres like '%Movie%' and channel_oid in (select channel_oid from channel_group where group_oid = 17)
and replace(replace(replace(replace(replace(replace(replace(title, '-', ''), '_', ''), '&', ''), '.', ''), ':', ''), ',', ''), '  ', ' ')
NOT IN (SELECT title FROM do_not_record WHERE source in ('DHD', 'DNR') AND title IS NOT NULL)

Search #2 for SD Channels
[select * from epg_event where] genres like '%Movie%' and channel_oid in (select channel_oid from channel_group where group_oid = 18)
and replace(replace(replace(replace(replace(replace(replace(title, '-', ''), '_', ''), '&', ''), '.', ''), ':', ''), ',', ''), '  ', ' ')
NOT IN (SELECT title FROM do_not_record WHERE source in ('DHD', 'DNR', 'SD') AND title IS NOT NULL)

Post Update EPG
This file does a major amount of the data collection, ensuring that the movie library is scanned and the WATCH_SD and DELETE_HD directories are created to support both PostProcessing as well as updating the DNR database table. It calls a batch file FFPROBER which parses existing JSON data (and re-creates if missing) to sort movies dynamically into HD and SD material, and note the total bit rate. You need to call FFPROBER for all movie drives as well as for subdirectories where you use them.  FFPROBER uses a couple of JScript files to parse the FFPROBER data and write out neat JSON files in each movie directory.

PostProcessing
Takes the completed recording, and determines if it is kept or deleted. Keeping means that it's either 1) a new movie or 2) a SD recording with higher bit rate or 3) a new HD recording of local SD material. The movie is deleted if 1) it's on the DNR list but got recorded by mistake in the SQL data or 2) its in SD quality but lower than what I have already.
If the movie is kept a JSON file is locally created with the metadata, and at HTML snippet (HTMX) is created help populate the HTML directory file.
Movies which are re-recordings are tagged with a "z-" before the directory name to find them more easily.

HTML Directory
Scripting is run after PostProcessing to update the HTML directory of current recordings so I can quickly decided if to keep and process further or delete. Hyperlinks with the movie name for Wikipedia and Rotten Tomatoes are created.

You will notice some checking for existing txt files and random delays - this is designed to avoid PostProcesing clobbering the HTML directory in case multiple recordings end at the exact same time.

I've attached a ZIP file with the scripts as well as sample data directories and structures where you can see examples of the metadata stored and how it's handled.
[attachment=1173]

Happy Easter!
    Dane
Hi Sub,

I've been reviewing the effects of validating deletions on movie name only (handy single field in the epg_event table) and come to the conclusion there are too many overlaps to use this method at full efficacy.

The structure for epg_event is below

[attachment=1188]

We've discussed the "unique_id" which is a proprietary ID used only by schedules direct, and seems there is no schema other than MVxxxxx for movies, however no database or source where I can get a list of these IDs and cross-reference to my existing media. I think it might be Gracenote, but if so is utopic that access is possible.

[attachment=1189]

Which leaves the "Moviename (date)" option which is about 99% sure, but no easy way to access this using SQL.  The movie metadata is stored as a date:time which also makes the concatenation in SQL a real PITA to fix at runtime.

Can you please look at extending the epg_event table by one field, storing the typical (based on user settings) filename to be used for the recording. So a field called "unique_name" would have for line 8 "Die Gänsemagd (2008)" as content, letting me update the do_not_record table with the same nomenclature and avoiding not recording content.

I thought of doing this somehow manually, but realised I would only get in your way of the EPG population and then any runtime(s) where the SQL advanced searches are being executed in your code.  I would assume that some minutes after the EPG is updated, you run the scan, but it could be immediate or perhaps after PostEPGUpdate is run - but in any event my SQL skills are nowhere near good enough to update the table quickly and safely.

Any chance you can look at this as a future development?

Thanks,
 Dane
(2020-04-14, 05:55 AM)daneo Wrote: [ -> ]We've discussed the "unique_id" which is a proprietary ID used only by schedules direct, and seems there is no schema other than MVxxxxx for movies, however no database or source where I can get a list of these IDs and cross-reference to my existing media. I think it might be Gracenote, but if so is utopic that access is possible.
The other EPG sources can use this field too. For example, with DVB EPG, you might end up with a string like 'Emmerdale/10563924', or XMLTV you might end up with a string like 'Killer-S02E03'. Ultimately though, this is just used to represent a unique show identifier. It can be in a lots of different formats. There is nothing more you can derive from the contents of this field. (ie, it's not intended to be human readable, and it can't be looked up against a database anywhere etc)
(2020-04-14, 05:55 AM)daneo Wrote: [ -> ]Which leaves the "Moviename (date)" option which is about 99% sure, but no easy way to access this using SQL.  The movie metadata is stored as a date:time which also makes the concatenation in SQL a real PITA to fix at runtime.

Can you please look at extending the epg_event table by one field, storing the typical (based on user settings) filename to be used for the recording. So a field called "unique_name" would have for line 8 "Die Gänsemagd (2008)" as content, letting me update the do_not_record table with the same nomenclature and avoiding not recording content.

I thought of doing this somehow manually, but realised I would only get in your way of the EPG population and then any runtime(s) where the SQL advanced searches are being executed in your code.  I would assume that some minutes after the EPG is updated, you run the scan, but it could be immediate or perhaps after PostEPGUpdate is run - but in any event my SQL skills are nowhere near good enough to update the table quickly and safely.

Any chance you can look at this as a future development?
Honestly, it doesn't make sense for me to do this. For the majority of NextPVR users around the world, I have no idea if a particular show is a movie, and I have no idea what year that movie might have been made (original_air_date is not set). Also, the vast majority of EPG_EVENT records are not movies.

In your case, you're using Schedules Direct and have a lot of this info available to you. You should be able to roll your own solution. Ultimately you already have access to the title and original_air_date fields, so should be able to build a string like the one you suggested above. You could run whatever job to do that in PostEPGUpdate.bat. sql has decent commands for extracting parts of a date, like the year etc.
Hi Sub,

thanks for the honest answer -- I will look into it. Can you at least guarantee that the scheduled recording queries will only be executed after PostUpdateEPG completes? If this is the case, then I have a time and code window in which I can do my magic in the database before another scheduling sweep happens.

Thanks,
Dane
(2020-04-14, 02:24 PM)daneo Wrote: [ -> ]Can you at least guarantee that the scheduled recording queries will only be executed after PostUpdateEPG completes?  If this is the case, then I have a time and code window in which I can do my magic in the database before another scheduling sweep happens.

NextPVR schedlues recordings ...1) At the end of the EPG Update process ... and ... 2) When a one-time or recurring recording is added via, for example, the TV Guide.

I would expect that PostUpdateEPG.bat runs after the re-scheduling that occurs as part of the (manual or automatic) Update EPG process.
.
No you have the wrong one see https://forums.nextpvr.com/showthread.ph...tUpdateEPG you want PostLoadEPG if you want to change the database before doing the recurring recordings.

Martin
(2020-04-14, 02:39 PM)mvallevand Wrote: [ -> ]No you have the wrong one see https://forums.nextpvr.com/showthread.ph...tUpdateEPG  you want PostLoadEPG if you want to change the database before doing the recurring recordings.

The inside of your head is something special ... I can't find posts (and usually can't remember posts) from last week ... for you, two and a half years ago is no problem ... Huh
(2020-04-14, 02:54 PM)Graham Wrote: [ -> ]The inside of your head is something special ... I can't find posts (and usually can't remember posts) from last week ... for you, two and a half years ago is no problem ... Huh
lol. I'd have to agree with that Big Grin
Yep, what Martin said... PostLoadEPG.bat
Pages: 1 2