2020-04-13, 07:31 AM
I recently completed some of the final steps in my movie library scraper, and I wanted to share how I do it with the NPVR users.
I've been collecting movies from OTA sources for years - started as a hobby back when the Hauppage SCART adapters could be used as a NPVR front end on the older v3 version. Then some software came around to collate the libraries (thanks, team Kodi) as well as metadata scrapers (hats off to TinyMM) to complete everything.
Some people collect stamps, other butterflies, I collect movies - around 3,300 of them so far.
The main problem before was threefold
1) How to avoid recording what I already have in HD (typically 1080p or 720p)
2) How to re-record things which may have dropouts or other glitches
3) How to re-record other shows where I may have a low bitrate or lower resolution and want to get a better version
I've fixed 2 and 3 a while ago with some scripting, json metadata files, and heavy post processing. The update to v5 and a rewritten scheduler with the ability to add "custom" searches (effectively SQL scripts) really made new things possible. Caveat - this is not for the faint of heart, and the scripts actively delete files going into postprocessing, as well as actively avoid recording movies on a "do not record" list.
There are still some bugs to iron out (handling file and directory names with ! in them) however it's around 95% there for me already.
Directory structures:
Recordings
_LOGS - logfiles, plus location of the DNR CSV file uploaded into the NPVR database
DO_NOT_RECORD - just empty directories with "cleaned" names of things to delete after or before recording
WATCH_SD - directories with the associated JSON files for SD movies already recorded (used by PostProcessing)
PREP_HD - directories with SD movies but with a very high bitrate (used by SQL scripts)
HD_REDO - movie names as directories where the original HD is a little more defective than usual or missing a minute beg/end (used by PostProcessing, but need to work into the SQL scripting, as these 5 are now being "avoided" by the scheduler
WATCH_SD_SPECIAL - same as HD_REDO but for SD materials where movie names can vary
Do Not Record:
Movies land here by running the DeleteMovie.cmd file against it - I've integrated this in the W10 shell commands for directories to make life easy. The moviename is cleaned before inserting a new entry into DO_NOT_RECORD directory
Watch SD Special:
Similar to above, this puts an entry into the WATCH_SD directory using the SD_Special_Delete.cmd file.
Note that Recordings is mapped as "R:" on my remote machine and is natively "H:" on the HTPC.
Database changes:
We need to add a new table called "DO_NOT_RECORD" which is updated daily as part of the PostUpdateEPG scripting. This is used in a SQL custom recording search which tries to avoid recording materials which are already available. Due to the structure of the EPG_EVENT table, the concatenation of (Moviename - Year) is not available and I don't want to blow up Sub's work by messing with existing tables. The DNR table uses a "cleaned" name where offensive combinations of dash, double space, ampersand period and other punctuation is removed. I also have to forcibly remove the movie year from the cleaned name, as it excludes recordings from EPG_EVENT going forward, and that field only has the name.
SQL Scripting
A script UpdateDNRDB.bat is run after EPG where the do-not-record material is re-generated by scanning the existing directories - this helps ensure that recently cut/added movies are quickly excluded from the recording lineup. Categories are SD - things to re-record in HD / DHD - delele as HD existing / DNR - do not record. The local directories are scanned
Recording Searches
There are two searches - one for a HD channel group (where I can excluded the DNR and DHD movies) and one for a SD channel group (where I can exclude the previous, plus a number of SD movies, currently around 500, where I have such a high bit rate it's not going to get better). These searches ensure that I can avoid wasting tuner capacity for stuff I already have or have decided I do not want.
Search #1 for HD Channels
[select * from epg_event where] genres like '%Movie%' and channel_oid in (select channel_oid from channel_group where group_oid = 17)
and replace(replace(replace(replace(replace(replace(replace(title, '-', ''), '_', ''), '&', ''), '.', ''), ':', ''), ',', ''), ' ', ' ')
NOT IN (SELECT title FROM do_not_record WHERE source in ('DHD', 'DNR') AND title IS NOT NULL)
Search #2 for SD Channels
[select * from epg_event where] genres like '%Movie%' and channel_oid in (select channel_oid from channel_group where group_oid = 18)
and replace(replace(replace(replace(replace(replace(replace(title, '-', ''), '_', ''), '&', ''), '.', ''), ':', ''), ',', ''), ' ', ' ')
NOT IN (SELECT title FROM do_not_record WHERE source in ('DHD', 'DNR', 'SD') AND title IS NOT NULL)
Post Update EPG
This file does a major amount of the data collection, ensuring that the movie library is scanned and the WATCH_SD and DELETE_HD directories are created to support both PostProcessing as well as updating the DNR database table. It calls a batch file FFPROBER which parses existing JSON data (and re-creates if missing) to sort movies dynamically into HD and SD material, and note the total bit rate. You need to call FFPROBER for all movie drives as well as for subdirectories where you use them. FFPROBER uses a couple of JScript files to parse the FFPROBER data and write out neat JSON files in each movie directory.
PostProcessing
Takes the completed recording, and determines if it is kept or deleted. Keeping means that it's either 1) a new movie or 2) a SD recording with higher bit rate or 3) a new HD recording of local SD material. The movie is deleted if 1) it's on the DNR list but got recorded by mistake in the SQL data or 2) its in SD quality but lower than what I have already.
If the movie is kept a JSON file is locally created with the metadata, and at HTML snippet (HTMX) is created help populate the HTML directory file.
Movies which are re-recordings are tagged with a "z-" before the directory name to find them more easily.
HTML Directory
Scripting is run after PostProcessing to update the HTML directory of current recordings so I can quickly decided if to keep and process further or delete. Hyperlinks with the movie name for Wikipedia and Rotten Tomatoes are created.
You will notice some checking for existing txt files and random delays - this is designed to avoid PostProcesing clobbering the HTML directory in case multiple recordings end at the exact same time.
I've attached a ZIP file with the scripts as well as sample data directories and structures where you can see examples of the metadata stored and how it's handled.
NPVR.zip (Size: 317.15 KB / Downloads: 6)
Happy Easter!
Dane
I've been collecting movies from OTA sources for years - started as a hobby back when the Hauppage SCART adapters could be used as a NPVR front end on the older v3 version. Then some software came around to collate the libraries (thanks, team Kodi) as well as metadata scrapers (hats off to TinyMM) to complete everything.
Some people collect stamps, other butterflies, I collect movies - around 3,300 of them so far.
The main problem before was threefold
1) How to avoid recording what I already have in HD (typically 1080p or 720p)
2) How to re-record things which may have dropouts or other glitches
3) How to re-record other shows where I may have a low bitrate or lower resolution and want to get a better version
I've fixed 2 and 3 a while ago with some scripting, json metadata files, and heavy post processing. The update to v5 and a rewritten scheduler with the ability to add "custom" searches (effectively SQL scripts) really made new things possible. Caveat - this is not for the faint of heart, and the scripts actively delete files going into postprocessing, as well as actively avoid recording movies on a "do not record" list.
There are still some bugs to iron out (handling file and directory names with ! in them) however it's around 95% there for me already.
Directory structures:
Recordings
_LOGS - logfiles, plus location of the DNR CSV file uploaded into the NPVR database
DO_NOT_RECORD - just empty directories with "cleaned" names of things to delete after or before recording
WATCH_SD - directories with the associated JSON files for SD movies already recorded (used by PostProcessing)
PREP_HD - directories with SD movies but with a very high bitrate (used by SQL scripts)
HD_REDO - movie names as directories where the original HD is a little more defective than usual or missing a minute beg/end (used by PostProcessing, but need to work into the SQL scripting, as these 5 are now being "avoided" by the scheduler
WATCH_SD_SPECIAL - same as HD_REDO but for SD materials where movie names can vary
Do Not Record:
Movies land here by running the DeleteMovie.cmd file against it - I've integrated this in the W10 shell commands for directories to make life easy. The moviename is cleaned before inserting a new entry into DO_NOT_RECORD directory
Watch SD Special:
Similar to above, this puts an entry into the WATCH_SD directory using the SD_Special_Delete.cmd file.
Note that Recordings is mapped as "R:" on my remote machine and is natively "H:" on the HTPC.
Database changes:
We need to add a new table called "DO_NOT_RECORD" which is updated daily as part of the PostUpdateEPG scripting. This is used in a SQL custom recording search which tries to avoid recording materials which are already available. Due to the structure of the EPG_EVENT table, the concatenation of (Moviename - Year) is not available and I don't want to blow up Sub's work by messing with existing tables. The DNR table uses a "cleaned" name where offensive combinations of dash, double space, ampersand period and other punctuation is removed. I also have to forcibly remove the movie year from the cleaned name, as it excludes recordings from EPG_EVENT going forward, and that field only has the name.
SQL Scripting
A script UpdateDNRDB.bat is run after EPG where the do-not-record material is re-generated by scanning the existing directories - this helps ensure that recently cut/added movies are quickly excluded from the recording lineup. Categories are SD - things to re-record in HD / DHD - delele as HD existing / DNR - do not record. The local directories are scanned
Recording Searches
There are two searches - one for a HD channel group (where I can excluded the DNR and DHD movies) and one for a SD channel group (where I can exclude the previous, plus a number of SD movies, currently around 500, where I have such a high bit rate it's not going to get better). These searches ensure that I can avoid wasting tuner capacity for stuff I already have or have decided I do not want.
Search #1 for HD Channels
[select * from epg_event where] genres like '%Movie%' and channel_oid in (select channel_oid from channel_group where group_oid = 17)
and replace(replace(replace(replace(replace(replace(replace(title, '-', ''), '_', ''), '&', ''), '.', ''), ':', ''), ',', ''), ' ', ' ')
NOT IN (SELECT title FROM do_not_record WHERE source in ('DHD', 'DNR') AND title IS NOT NULL)
Search #2 for SD Channels
[select * from epg_event where] genres like '%Movie%' and channel_oid in (select channel_oid from channel_group where group_oid = 18)
and replace(replace(replace(replace(replace(replace(replace(title, '-', ''), '_', ''), '&', ''), '.', ''), ':', ''), ',', ''), ' ', ' ')
NOT IN (SELECT title FROM do_not_record WHERE source in ('DHD', 'DNR', 'SD') AND title IS NOT NULL)
Post Update EPG
This file does a major amount of the data collection, ensuring that the movie library is scanned and the WATCH_SD and DELETE_HD directories are created to support both PostProcessing as well as updating the DNR database table. It calls a batch file FFPROBER which parses existing JSON data (and re-creates if missing) to sort movies dynamically into HD and SD material, and note the total bit rate. You need to call FFPROBER for all movie drives as well as for subdirectories where you use them. FFPROBER uses a couple of JScript files to parse the FFPROBER data and write out neat JSON files in each movie directory.
PostProcessing
Takes the completed recording, and determines if it is kept or deleted. Keeping means that it's either 1) a new movie or 2) a SD recording with higher bit rate or 3) a new HD recording of local SD material. The movie is deleted if 1) it's on the DNR list but got recorded by mistake in the SQL data or 2) its in SD quality but lower than what I have already.
If the movie is kept a JSON file is locally created with the metadata, and at HTML snippet (HTMX) is created help populate the HTML directory file.
Movies which are re-recordings are tagged with a "z-" before the directory name to find them more easily.
HTML Directory
Scripting is run after PostProcessing to update the HTML directory of current recordings so I can quickly decided if to keep and process further or delete. Hyperlinks with the movie name for Wikipedia and Rotten Tomatoes are created.
You will notice some checking for existing txt files and random delays - this is designed to avoid PostProcesing clobbering the HTML directory in case multiple recordings end at the exact same time.
I've attached a ZIP file with the scripts as well as sample data directories and structures where you can see examples of the metadata stored and how it's handled.
NPVR.zip (Size: 317.15 KB / Downloads: 6)
Happy Easter!
Dane
- Dane
Cheap Medion Minitower, but it's quiet enough
- Windows 11x64
- Digital Devices DVB-C PCIe (1x Cine quad and 1x Cine dual)
- LG C2 with Jellyfin front end, NPVR back end
- 30TB storage, ~7,500 films
Cheap Medion Minitower, but it's quiet enough
- Windows 11x64
- Digital Devices DVB-C PCIe (1x Cine quad and 1x Cine dual)
- LG C2 with Jellyfin front end, NPVR back end
- 30TB storage, ~7,500 films