NextPVR Forums
  • ______
  • Home
  • New Posts
  • Wiki
  • Members
  • Help
  • Search
  • Register
  • Login
  • Home
  • Wiki
  • Members
  • Help
  • Search
NextPVR Forums Public Add-ons (3rd party plugins, utilities and skins) Old Stuff (Legacy) UbuStream, Web Radio and Universe v
« Previous 1 … 4 5 6 7 8 Next »
New DynSource App - ABC Australia Video On Demand

 
  • 0 Vote(s) - 0 Average
New DynSource App - ABC Australia Video On Demand
ralphy
Offline

Senior Member

Posts: 255
Threads: 51
Joined: Nov 2006
#1
2007-02-18, 10:25 AM
After learning some basic C# and scouring through http://www.abc.net.au html pages and using Norton's Internet Security's web history and finding out what pages were being called I've managed to get a new DynSource app to scrape Australia's ABC video on demand pages for content with the help of ubu's developer kit. http://forums.gbpvr.com//attachment.php?...1165135749 http://forums.nextpvr.com/showthread.php?t=21424&page=6

As best I can tell, the app works ok, but has a couple of problems have been noticed that may be ubustream related.


  1. the 'playschool40' scrape is no longer available at abc.net.au, however, it picks up duplicate information from the 'healthmatters' scrape. If all the sections are selected as groups in ubuconfig, Ubustream2 seems to get confused, and the 'healthmatters' sections (and after) point to the incorrect URL's in GBPVR. My solution is not to select playschool40 for grouping in ubuconfig. (Interestingly, the playschool site seems to have been changed/removed only in the last 96 hours. Maybe ABC will fix their web page shortly)
  2. I haven't found the ability to list the playlist within a group in a non-alphabetical order. I want to do this to preserve the playlist order of the scrape, since the order of the playlist is sometimes important for continuity. My solution was to insert a playlist number to preserve the ordering. This will fail to work properly if there are more than 99 items in the playlist (unlikely). An 'order by' option in ubuconfig for the group would be nice to enable a 'natural' order (as scraped) or 'alphabetical'
  3. I tried to add the duration of the clip to the title description, but this caused ubustream to crash as below.
[INDENT]
************** Exception Text **************
System.IndexOutOfRangeException: Index was outside the bounds of the array.
at UbuStreamPlugin.UbuStreamImporter.read_PLS_File(String playlistFile, String myMediaPlayer, String myStreamType, String myGroup, Boolean addGroups)
at UbuStreamPlugin.UbuStreamImporter.Import(String playlistFile, ImportFileType playlistFileType, StreamType defaultStreamType, String defaultMediaPlayer, String UbuStreamXmlFile, String groupName, Boolean addGroups)
at UbuStreamPlugin.UbuStreamConfigForm.backgroundWorker1_RunWorkerCompleted(Object sender, RunWorkerCompletedEventArgs e)
at System.ComponentModel.BackgroundWorker.OnRunWorkerCompleted(RunWorkerCompletedEventArgs e)
at System.ComponentModel.BackgroundWorker.AsyncOperationCompleted(Object arg)

[/INDENT]



In the interim, I think the app is fairly robust, but consider it beta! Enjoy Big Grin

As with all DynSource apps, extract the attachment into the ../plugins/ubstream/Dynsource subfolder, run ubuconfig and select the sections you want to scrape (But do not include playschool40 for reasons mentioned above).
[SIZE="1"]Silverstone GD01S-MXR (three dead rows of pixels in the LCD and defective remote control), Power: Zalman ZM460B-APS (blew up - can't remember what's there now); CPU: Pentium D 3.2 GHz with Asus V72 Cooler; MD: Asus P5LD2 Deluxe 2048MB,
WDC WD10EADS 1TB Data, 320GB System, Asus EN9400GT Silent 512MB, Hauppauge HVR 1300,
XP Home SP3, GB-PVR 2.0, ExternalDisplay v0.3[/SIZE]
ubu
Offline

Posting Freak

Posts: 792
Threads: 54
Joined: Jan 2006
#2
2007-02-19, 03:48 AM
ralphy Wrote:After learning some basic C# and scouring through http://www.abc.net.au html pages and using Norton's Internet Security's web history and finding out what pages were being called I've managed to get a new DynSource app to scrape Australia's ABC video on demand pages for content with the help of ubu's developer kit. http://forums.gbpvr.com//attachment.php?...1165135749 http://forums.nextpvr.com/showthread.php?t=21424&page=6

As best I can tell, the app works ok, but has a couple of problems have been noticed that may be ubustream related.
Nice one, ralphy. Smile

I'll install it and check it out later tonight. Meanwhile:
Quote:the 'playschool40' scrape is no longer available at abc.net.au, however, it picks up duplicate information from the 'healthmatters' scrape. If all the sections are selected as groups in ubuconfig, Ubustream2 seems to get confused, and the 'healthmatters' sections (and after) point to the incorrect URL's in GBPVR. My solution is not to select playschool40 for grouping in ubuconfig. (Interestingly, the playschool site seems to have been changed/removed only in the last 96 hours. Maybe ABC will fix their web page shortly)
If it doesn't get fixed, you can always filter it out of the available sections in the Get_Sections method so it doesn't even show up in UbuStream.

Quote:I haven't found the ability to list the playlist within a group in a non-alphabetical order. I want to do this to preserve the playlist order of the scrape, since the order of the playlist is sometimes important for continuity. My solution was to insert a playlist number to preserve the ordering. This will fail to work properly if there are more than 99 items in the playlist (unlikely). An 'order by' option in ubuconfig for the group would be nice to enable a 'natural' order (as scraped) or 'alphabetical'
Have you tried unchecking Use alphabetic sort for groups and stations in the Options->General panel of the Ubu config app? That should make everything show up in "as entered" sequence in both the config app and the GB-PVR UI (I'm pretty sure I put that option in v2.0 - I've been working on the next release so long, I can't be certain).

Quote:I tried to add the duration of the clip to the title description, but this caused ubustream to crash as below.
if you attach the code fragment where you're doing this (and the PLX file it produces), I'll take a look and see if I can figure out why that's happening.
[SIZE=1]GBPVR v1.3.11 [/SIZE][SIZE=1]HVR-1250, [/SIZE][SIZE=1]ES7300[/SIZE][SIZE=1], 4GB, GeForce 9300, LianLi, Vista.[/SIZE]
[SIZE=1]GBPVR v1.0.08 [/SIZE][SIZE=1]PVR-150, [/SIZE][SIZE=1]P4 2.26GHz, [/SIZE][SIZE=1]1GB,[/SIZE][SIZE=1] GeForce 6200, [/SIZE]Coupden, XP[SIZE=1]
[/SIZE]

Author: UbuStream plugin, UbuRadio plugin, EPGExtra utility.
ralphy
Offline

Senior Member

Posts: 255
Threads: 51
Joined: Nov 2006
#3
2007-02-20, 01:07 AM
ubu Wrote:If it doesn't get fixed, you can always filter it out of the available sections in the Get_Sections method so it doesn't even show up in UbuStream.
The problem appears to be that because there are two identical titles for different sections (eg healthminutes and playschool40), the first URL is being overridden by the last. For the attached plx, because playschool40 is incorrect on the source webpage, "healthminutes" gets redirected to the playschool url in Ubustream. Is this because the database stores everything by title as a unique index??


ubu Wrote:Have you tried unchecking Use alphabetic sort for groups and stations in the Options->General panel of the Ubu config app? That should make everything show up in "as entered" sequence in both the config app and the GB-PVR UI (I'm pretty sure I put that option in v2.0 - I've been working on the next release so long, I can't be certain).

doh! If all else fails, read the instructions! Sorry about that

ubu Wrote:if you attach the code fragment where you're doing this (and the PLX file it produces), I'll take a look and see if I can figure out why that's happening.


for (match1 = regex1.Match(myPage); match1.Success; match1 = match1.NextMatch())
{
itemcount++;
siteItem = new DynamicSource.SiteItem();
siteItem.URL = section.URL + "meta/hq"+ itemcount + ".asx" ;
// siteItem.Title = itemcount.ToString("D2") + ": "+match1.Groups[1].ToString();//+ ": " +match2.Groups[1].ToString().Replace("\r","");//.Replace("‘","?").Replace("’","?");
siteItem.Title =match1.Groups[1].ToString() + ": " +match2.Groups[1].ToString();
siteItem.Description = match1.Groups[1] + "\n"+match3.Groups[2].ToString() ;
siteItem.Description = siteItem.Description + "\n Duration: " +match2.Groups[1].ToString() ;
siteItem.WebSite = match4.Groups[1].ToString();//schedulePage;
siteItem.Section = section.Name;
newsItems.Add(siteItem);
Console.WriteLine("Title >> " +siteItem.Title + "\n URL >> " + siteItem.URL + "\n Description: " + siteItem.Description);
match2=match2.NextMatch();

}
return newsItems;




As you can see, I tried to remove unprintable characters (\r) (and I also tried ":", " ' ", and "\"" but these didn't seem to help (unless I missed something like the 'non alphabetical" option!)

You've also notice that at one stage I was using Replace("‘","?").Replace("’","?") (for match1) because sometimes the ‘ and ’ in the titles (eg title 40 in the attached plx) don't appear on the GBPVR screen correctly. It's obvisously a character set issue, because I have the same problem when looking at the plx file in wordpad but not notepad.


Thanks ubu for a great plugin and any further insights you may have into the above. Smile
[SIZE="1"]Silverstone GD01S-MXR (three dead rows of pixels in the LCD and defective remote control), Power: Zalman ZM460B-APS (blew up - can't remember what's there now); CPU: Pentium D 3.2 GHz with Asus V72 Cooler; MD: Asus P5LD2 Deluxe 2048MB,
WDC WD10EADS 1TB Data, 320GB System, Asus EN9400GT Silent 512MB, Hauppauge HVR 1300,
XP Home SP3, GB-PVR 2.0, ExternalDisplay v0.3[/SIZE]
ubu
Offline

Posting Freak

Posts: 792
Threads: 54
Joined: Jan 2006
#4
2007-02-20, 09:05 AM
ralphy Wrote:The problem appears to be that because there are two identical titles for different sections (eg healthminutes and playschool40), the first URL is being overridden by the last. For the attached plx, because playschool40 is incorrect on the source webpage, "healthminutes" gets redirected to the playschool url in Ubustream. Is this because the database stores everything by title as a unique index??
You are correct. "Station Name" is the primary key of the station database and, therefore, must be unique. The import code logic detects if the "Title" you are importing already exists in the database. If it exists, the item is not imported unless the URL is different, in which case, it assumes you want to update the URL. So, basically, it's a "last in wins" situation, so that's why the second URL is overwriting the first one.

One solution would be to check for duplicate titles in your code and append something to the second title to make it unique. eg: "HRT and urinary incontinence - playschool40" (I've got to remember to watch that one, btw. Sounds pretty intense. Big Grin ). On the other hand, since the "playschool40" items appear to be exactly the same as the "health minutes" items, why not just hard code something in "Get_Sections" to ignore "playschool40". That way, it wouldn't get published as a section in UbuStream and its items wouldn't show up in your PLX file.

Quote: As you can see, I tried to remove unprintable characters (\r) (and I also tried ":", " ' ", and "\"" but these didn't seem to help (unless I missed something like the 'non alphabetical" option!)

You've also notice that at one stage I was using Replace("‘","?").Replace("’","?") (for match1) because sometimes the ‘ and ’ in the titles (eg title 40 in the attached plx) don't appear on the GBPVR screen correctly. It's obvisously a character set issue, because I have the same problem when looking at the plx file in wordpad but not notepad.
I guess there's two different issues.

First, the non-printable characters in title 40:

I did a hex dump of your PLX file and sure enough, instead of the " ' " character (x047 or x054) they have three characters completely outside the "normal" ASCII range (xE2, x80 and x99). If you could determine the code page they're using, you could figure out what that means and try to translate it. Or you could simply do a Regex.Replace of "\xE2\x80\x99" with "\x47" which should put the correct apostrophe (') character there. [SIZE=2]Or you could just say "bugger it" and replace [\x80-\xFF] with a space, which would simply get rid of all non-printable characters. Smile

The other problem - appending the duration to the title causing an
[/SIZE][I]IndexOutOfRangeException [/I][SIZE=2]when the UbuStream importer reads the PLX file:

Turns out this isn't caused by adding the duration info, per se, but is triggered by titles containing text within [ ] brackets. The import code assumes this is a meta-tag being passed as part of a WorldWide Media Project download. (I could probably use a better criteria for determing that it's a WWMP stream, but that's another story). An example in your PLX file is "Baby Names [M - Coarse language]: 1.59". Before you added the duration, the "]" was the last character in the title, so the importer did a Split using the "]" character, couldn't see any data beyond it, and decided it wasn't an WWMP stream. Once you added the duration, the importer saw the extra data and fell into the WWMP processing code which expected 5 additional items in the Split array, tried to access one and "kaboom". :eek:

I can actually fix this in the next release of UbuStream by checking that all five additional items are present before trying to read them. In the meanwhile, if you replace all [...] combos with (....) instead, appending the duration to the title should work fine.

Different subject: you probably know this already but each link url doesn't actually point to a discrete video item. They aggregate. For instance, if a section has Item1 through Item4, then Item1 points to an asx file that contains a playlist of items 1 through 4, Item2 points to an asx file with a playlist of items 2 through 4, Item3 has a playlist of Items 3 and 4 and, finally, Item4 points to a playlist containing just Item4. Wierd. Not a problem, just wierd. The format of the asx files are a little bit non-standard too and do present a problem for one of the features I'm working on for UbuStream v2.1. However, "forewarned is forearmed" and I'll probably figure out how to handle it.

The video streams themselves are pretty high quality. Oddly enough, even though they are 16:9 (400x224) streams, both WMP and VLC seem to play them at 4:3 when in fullscreen mode. If you force them to use 16:9, they look really nice. (With UbuStream v2.1, you'll be able to specify the aspect ratio of each station/stream or dynamic source, so it will automatically play them with the correct aspect ratio.)


[/SIZE]
[SIZE=1]GBPVR v1.3.11 [/SIZE][SIZE=1]HVR-1250, [/SIZE][SIZE=1]ES7300[/SIZE][SIZE=1], 4GB, GeForce 9300, LianLi, Vista.[/SIZE]
[SIZE=1]GBPVR v1.0.08 [/SIZE][SIZE=1]PVR-150, [/SIZE][SIZE=1]P4 2.26GHz, [/SIZE][SIZE=1]1GB,[/SIZE][SIZE=1] GeForce 6200, [/SIZE]Coupden, XP[SIZE=1]
[/SIZE]

Author: UbuStream plugin, UbuRadio plugin, EPGExtra utility.
ralphy
Offline

Senior Member

Posts: 255
Threads: 51
Joined: Nov 2006
#5
2007-02-20, 12:20 PM
Thanks for your help ubu and clues on that powerful tool 'regex'.


ubu Wrote:One solution would be to check for duplicate titles in your code and append something to the second title to make it unique. eg: "HRT and urinary incontinence - playschool40" (I've got to remember to watch that one, btw. Sounds pretty intense. Big Grin ). On the other hand, since the "playschool40" items appear to be exactly the same as the "health minutes" items, why not just hard code something in "Get_Sections" to ignore "playschool40". That way, it wouldn't get published as a section in UbuStream and its items wouldn't show up in your PLX file.
I might have to make this change. Interestingly, Playschool's 40th birthday video clips were available last week, but something changed at the ABC site and then UBUStream started pointing all the health matters clips to Playschool.

ubu Wrote:Different subject: you probably know this already but each link url doesn't actually point to a discrete video item. They aggregate. For instance, if a section has Item1 through Item4, then Item1 points to an asx file that contains a playlist of items 1 through 4, Item2 points to an asx file with a playlist of items 2 through 4, Item3 has a playlist of Items 3 and 4 and, finally, Item4 points to a playlist containing just Item4.
Yep noticed this too ... can make 'Play Group' an extra, extra long video sequenceSmile But then again, there's no need to 'Play Group' either. Just play the first clip.


What this space for the next build of OzABCDynSource addressing the above problems.


Next dynsource app - ninemsn.com.au. (You can tell I must be a home sick expat)
[SIZE="1"]Silverstone GD01S-MXR (three dead rows of pixels in the LCD and defective remote control), Power: Zalman ZM460B-APS (blew up - can't remember what's there now); CPU: Pentium D 3.2 GHz with Asus V72 Cooler; MD: Asus P5LD2 Deluxe 2048MB,
WDC WD10EADS 1TB Data, 320GB System, Asus EN9400GT Silent 512MB, Hauppauge HVR 1300,
XP Home SP3, GB-PVR 2.0, ExternalDisplay v0.3[/SIZE]
« Next Oldest | Next Newest »

Users browsing this thread: 1 Guest(s)



Possibly Related Threads…
Thread Author Replies Views Last Post
  VLC runs but no audio/video for Web Streams and Web Radio Wakalaka 0 1,796 2007-11-22, 06:30 AM
Last Post: Wakalaka
  CNN Dynsource - New Version HTPCGB 0 1,903 2007-07-15, 10:42 PM
Last Post: HTPCGB
  CNN Dynsource Update - Now with images HTPCGB 0 1,970 2007-07-12, 11:17 PM
Last Post: HTPCGB
  DynSource authors - I need your input ubu 0 1,665 2007-05-04, 05:17 AM
Last Post: ubu
  BBC DynSource Updated ralphy 3 2,917 2007-04-21, 07:50 AM
Last Post: ubu
  SkyNews DynSource app to replace broken BBC one ubu 7 3,703 2007-03-28, 07:38 PM
Last Post: ubu
  New DynSource App - NineMSN.com.au ralphy 0 1,882 2007-03-18, 10:25 AM
Last Post: ralphy
  CNN Dynsource - system.argumentoutofrange Error ralphy 3 2,547 2007-02-19, 12:18 AM
Last Post: ralphy
  CNN Dynsource Fix HTPCGB 0 1,974 2007-02-18, 06:14 PM
Last Post: HTPCGB
  Bad AFTV Video using VLC fhmanas 2 2,682 2007-02-01, 10:09 PM
Last Post: ubu

  • View a Printable Version
  • Subscribe to this thread
Forum Jump:

© Designed by D&D, modified by NextPVR - Powered by MyBB

Linear Mode
Threaded Mode