Giter Club home page Giter Club logo

zap2it-guidescraping's People

Contributors

billybob1354 avatar daniel-widrick avatar daniel15 avatar gabebster avatar gpoupon avatar jeremybrhere avatar micahmo avatar pseudoresonance avatar rgruetz avatar th0ma7 avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar

zap2it-guidescraping's Issues

Support for non-OTA listings

I'm trying to download listings for a Cable broadcast, but it seems like I can only download OTA broadcast listings

Ampersand in 'display-name' element value results in xml not parsable by jellyfin.

xml with

	<channel id="81302">
		<display-name>49.3 WMLWDT3</display-name>
		<display-name>49.3</display-name>
		<display-name>WMLWDT3</display-name>
		<display-name>Heroes & Icons Network</display-name>
		<icon src="http://zap2it.tmsimg.com/h3/NowShowing/81302/s90401_h3_aa.png" />
	</channel>

results in a stack trace and no guide date in Jellyfin:10.7.6

[21:24:39] [ERR] [21] Emby.Server.Implementations.LiveTv.EmbyTV.EmbyTV: Error adding metadata
System.Xml.XmlException: An error occurred while parsing EntityName. Line 499, position 25.
   at System.Xml.XmlTextReaderImpl.Throw(Exception e)
   at System.Xml.XmlTextReaderImpl.Throw(String res, Int32 lineNo, Int32 linePos)
   at System.Xml.XmlTextReaderImpl.HandleEntityReference(Boolean isInAttributeValue, EntityExpandType expandType, Int32& charRefEndPos)
   at System.Xml.XmlTextReaderImpl.ParseText(Int32& startPos, Int32& endPos, Int32& outOrChars)
   at System.Xml.XmlTextReaderImpl.FinishPartialValue()
   at System.Xml.XmlTextReaderImpl.get_Value()
   at System.Xml.XmlReader.InternalReadContentAsString()
   at System.Xml.XmlSubtreeReader.ReadContentAsString()
   at System.Xml.XmlReader.ReadElementContentAsString()
   at Jellyfin.XmlTv.XmlTvReader.ProcessNode(XmlReader reader, Action`1 setter, String languageRequired, Action`1 allOccurrencesSetter)
   at Jellyfin.XmlTv.XmlTvReader.GetChannel(XmlReader reader)
   at Jellyfin.XmlTv.XmlTvReader.GetChannels()
   at Emby.Server.Implementations.LiveTv.Listings.XmlTvListingsProvider.GetChannels(ListingsProviderInfo info, CancellationToken cancellationToken)
   at Emby.Server.Implementations.LiveTv.EmbyTV.EmbyTV.GetEpgChannels(IListingsProvider provider, ListingsProviderInfo info, Boolean enableCache, CancellationToken cancellationToken)
   at Emby.Server.Implementations.LiveTv.EmbyTV.EmbyTV.AddMetadata(IListingsProvider provider, ListingsProviderInfo info, IEnumerable`1 tunerChannels, Boolean enableCache, CancellationToken cancellationToken)
   at Emby.Server.Implementations.LiveTv.EmbyTV.EmbyTV.GetChannelsAsync(Boolean enableCache, CancellationToken cancellationToken)
[21:24:39] [INF] [21] Emby.Server.Implementations.LiveTv.LiveTvManager: Refreshing guide with 7 days of guide data

I can test fixing this but escaping or removing the & and submit a patch.

How to use this correctly?

I hate to be that guy, but I have been googling for the past 2 hours trying to find some kind of info, literally ANYTHING on how to use this, and there is nothing.

The Readme is basically useless. (I'm sorry to be blunt, but it is)

The only thing this script seems to do at all is the -h for the help section.

Every single other command simply returns 4-6 errors depending on the command, and then nothing.

I have already edited my config file to be correct, it should "just work" based on what I've read, but it doesn't.

This is the error I get everytime.

Traceback (most recent call last):
File "C:\Users\Administrator\Downloads\zap2it-GuideScraping-20220901\zap2it-GuideScrape.py", line 301, in
guide = Zap2ItGuideScrape(optConfigFile,optGuideFile)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "C:\Users\Administrator\Downloads\zap2it-GuideScraping-20220901\zap2it-GuideScrape.py", line 16, in init
self.lang = self.config.get("prefs","lang")
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "C:\Users\Administrator\AppData\Local\Programs\Python\Python311\Lib\configparser.py", line 797, in get
d = self._unify_values(section, vars)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "C:\Users\Administrator\AppData\Local\Programs\Python\Python311\Lib\configparser.py", line 1168, in _unify_values
raise NoSectionError(section) from None
configparser.NoSectionError: No section: 'prefs'

Lack of exception handling

I am frequently getting errors similar to this:

thumnailEl = self.CreateElementWithData("thumnail","http://zap2it.tmsimg.com/assets/" + event["thumbnail"] + ".jpg")
TypeError: can only concatenate str (not "NoneType") to str

I am not sure exactly why sometimes the object is empty, but it would probably be best to handle the exception in a way that does not cause a total failure and results in a guide still being generated, even if incomplete.

Config file example?

Am I an idiot or why can I not intuitively figure out how to create a properly formatted config file? Is there a sample I can review somewhere please? I really do not understand what is supposed to go into it. From reading the python file (which was the only way I even determined that the file would need to be named "zap2itconfig.ini") and a bit of research on properly formatting an ini file, I managed to piece together the following, which raises a "HTTP Error 401: Unauthorized" error. So at least I got beyond the MissingSectionHeaderError and NoSectionError. But a little guidance would be wonderful, please.

 [url]
 "https://tvlistings.zap2it.com/api/printGrid?lineupId=...etc
 [creds]
 [email protected]
 username=my_username
 password=my_password
 [prefs]
 country=USA
 zipCode=00000
 lang=en-us

Deleted

Nevermind, bad line endings on file due to windows edit.

Output all <channel> tags before others?

Could you please update your script to output the <channel> tags before all the others such as <programme>, or add an option for it?

When matching M3U playlists to XMLTV EPGs, some IPTV clients stop parsing the XML file as soon as they hit a tag that's not <channel>, to reduce the time it takes to load all the channel data (since they don't need the programme data at this point, only the channel data). For example, TiviMate does this. In the case of your zap2it-GuideScraping script, this means that TiviMate (by default) only shows the first channel in the resulting guide 😢

Error handling language

Since I've migrated to you python solution I noticed that the french accentuated characters stopped working in tvheadend.
They end-up being showed as:

Pour faire le tour des nouvelles et connaître les sujets qui font les manchettes à Montréal et partout au Québec; pour écouter les reportages de nos journalistes dans toutes les régions, les entrevues sur les sujets les plus chauds.

Where really all éèê... ends-up being î or é ...

I looked at the xmltv output differences and noticed a few things.
First, here is the other perl script output:

	<programme start="20210517053000 -0400" stop="20210517060000 -0400" channel="I2.1.45867.zap2it.com">
		<title lang="fr">Les Schtroumpfs</title>
		<sub-title lang="fr">Larmes de Schtroumpfs; L&apos;étoffe du temps</sub-title>
		<desc lang="fr">Le Grand Schtroumpf crée une substance magique qui permet à une seule rose de devenir aussi odorante que toute une roseraie, mais une petite erreur cause tous les Schtroumpfs dans les alentours de fondre en larmes.</desc>
		<category lang="fr">Family</category>
		<category lang="fr">Series</category>
		<length units="minutes">30</length>
		<icon src="https://zap2it.tmsimg.com/assets/p16054575_e_v4_aa.jpg" />
		<url>https://tvlistings.zap2it.com//overview.html?programSeriesId=SH00158738&amp;tmsId=EP001587380006</url>
		<episode-num system="dd_progid">EP00158738.0006</episode-num>
		<previously-shown />
		<subtitles type="teletext" />
		<rating>
			<value>G</value>
		</rating>
	</programme>

And here the python one for that exact same "french language" tv show:

    <programme start="20210517093000 +0000" stop="20210517100000 +0000" channel="45867">
      <title lang="en">Les Schtroumpfs</title>
      <sub-title lang="en">Larmes de Schtroumpfs; L&amp;apos;étoffe du temps </sub-title>
      <desc lang="en">Le Grand Schtroumpf crée une substance magique qui permet à une seule rose de devenir aussi odorante que toute une roseraie, mais une petite erreur cause tous les Schtroumpfs dans les alentours de fondre en larmes.</desc>
      <length units="minutes">30</length>
      <category>family</category>
  <thumbnail>http://zap2it.tmsimg.com/assets/p16054575_e_v4_aa.jpg</thumbnail>
<episode-num system="SxxExx">S03E00</episode-num><episode-num system="dd_progid">EP00158738.0006</episode-num>    </programme>

Some of the differences I found:

  • <title lang="fr"> set to en instead of fr
  • <sub-title lang="en"> set to en instead of fr
  • <desc lang="en"> set to en instead of fr
  • Missing newline before ending </programme>
  • Missing the <previously-shown />, <subtitles type="teletext" />, <rating>, <url>
  • Using <thumbnail> instead of <icon src=... (not sure it makes any difference)
  • Providing a "single" category instead of multiples (honestly I believe the lang="fr" is both wrong and useless):
<category lang="fr">Family</category>
<category lang="fr">Series</category>

How to setup to work with Jellyfin

How can I set up this repository to work with Jellyfin? I assume I need to automate this script to run every 10 minutes or something like that.

I am using Ubuntu 20.4 LTS & Jellyfin 10.7.7.

Any example how runs?

im fully new on this i have basic skill on linux, just try run scrip .sh one but i get errors i suppose to fill somethings on script right?

i add my username and password values country USA, and zipcode still get errors , i guess its something that i dont fill on script,

any example?

whats the ini file? one that ends with .dist i fill my info in that one. thanks

Suggestions for improvement

Hi,
I started looking into your project as my alternate option using perl is no longer maintained and not trivial to package to due compile CPAN mandatory modules needed (now only available using the archive section).
https://web.archive.org/web/20200426004001/http://zap2xml.awardspace.info/

I played a bit with your python scripts and must say that it runs quite well. As developer in the SynoCommunity I intent to package your python scripts in order for it to be available for both jellyfin and tvheadend. Although while playing with it I thought I would provide you with a list of potential improvements/nice-to-have that you might see fit that would simplify integration down the road:

  1. Ability to pass the configuration file in argument: this allows breaking the files into sub-directory and gets easier to maintain through package upgrades (scripts can be updated independently from the user-configuration)
  2. Ability to define a location (either in argument an/or through the configuration file) for the generated resulting xmltv file. Again, similar to item 1. it allows to grant permissions only on that directory so other applications can get access to it.
  3. Ability to keep a few backup copies of the xmltv file. perhaps something like using a timestamp. As such the latest would always have the same name while a next update it creates a xmlguide.TIMESTAMP.xmltv. Cleanup can then be managed through a cron job or within the python script itself (for instance keeping X days of backups).
  4. the perl script I refereed to had a few nice features, one was to download picons locally as well. This can be quite useful in for some use-cases. Here a .png listing example generated by perl zap2it script:
10.1 CFTMDT.png
10.1_CFTMDT.png
10.1.png
12.1 CFCFDT.png
12.1_CFCFDT.png
12.1.png
...
s10084_h3_aa.png
s10108_h3_aa.png
s10125_h3_aa.png
...
  1. again, the perl script had the ability to cache the downloaded information. This allow to re-download only what was not previously cached and reduces hits on the server. Having this capability using a dedicated caching directory would be really nice. One caveat is that when a sub-cache file contains programs with "$sTBA" (to be advertised) they must be discarded or re-downloaded at next update as program info may end-up being updated by that time (e.g. some movies in 10-14 days may not yet be advertised... but a few days before they do occur the program guide might have received an update).

I'll be looking at creating a SynoCommunity package over the next few weeks. Let me know if you are interested at adapting a few bits of your scripts.

Ref: SynoCommunity/spksrc#3932 (comment)

Thnx in advance and again, nice work :)

Guide data blank when loaded into Jellyfin unless channels manually mapped, everything listed as "new"

Hello, should this work with OTA tv?
When loaded into Jellyfin, the guide is completely blank unless every channel manually mapped

I think the channel mapping is incorrect, I can manually go in and map channels, but shouldnt this be automatically mapped?
image
image

after manually mapping a channel, the guide data is correct, however, everything is listed as "new"
image

these shows are not listed as "new" in zap2it website
image

is this a jellyfin bug? or am I running the script incorrectly? Thanks for any guidance!

output file:
xmlguide.xmltv.zip

config file:

[creds]
Username: [email protected]
Password: REDACTED
[prefs]
country: USA
zipCode: 55303
historicalGuideDays: 14
lang: en

Doesn't handle if thumbnail is missing

Namespace(configfile=None, outputfile=None, language=None, findid=False)
Loading config:  ./zap2itconfig.ini  and outputting:  xmlguide.xmltv
Load Guide for time:  1674734400.0
Load Guide for time:  1674745200.0
Load Guide for time:  1674756000.0
Load Guide for time:  1674766800.0
Traceback (most recent call last):
  File "/opt/zap2it-GuideScraping/./zap2it-GuideScrape.py", line 313, in <module>
    guide.BuildGuide()
  File "/opt/zap2it-GuideScraping/./zap2it-GuideScrape.py", line 256, in BuildGuide
    self.AddEventsToGuide(json)
  File "/opt/zap2it-GuideScraping/./zap2it-GuideScrape.py", line 109, in AddEventsToGuide
    self.rootEl.appendChild(self.BuildEventXmL(event,channel["channelId"]))
  File "/opt/zap2it-GuideScraping/./zap2it-GuideScrape.py", line 141, in BuildEventXmL
    thumnailEl = self.CreateElementWithData("thumnail","http://zap2it.tmsimg.com/assets/" + event["thumbnail"] + ".jpg")
TypeError: can only concatenate str (not "NoneType") to str

[runitor] exit status 1

Also it seems like there's a typo there - "thumnail" instead of "thumbnail"?

Jellyfin (10.7.7) flags all shows as new.

To fix, I modified the script to add in "previously-shown" flag if "New" is never found.
My fix looks like:

        #Handle Flags
        #KMM - Added this isNew variable did this for Jellyfin guide
        isNew = False;
        for flag in event["flag"]:
            if flag == "New":
                programEl.appendChild(self.guideXML.createElement("New"))
                isNew = True
                #KMM - set is new above when new flag found.
            if flag == "Finale":
                programEl.appendChild(self.guideXML.createElement("Finale"))
            if flag == "Premiere":
                programEl.appendChild(self.guideXML.createElement("Premiere"))
        #KMM- I then check if new flag was found and if not I add in the previously-show flag ... for jellyfin..
        if isNew == False:
            programEl.appendChild(self.guideXML.createElement("previously-shown"))

Filtering Channels/Output

Hi, great app, works like a charm!

One thing I'm curious about: is it possible to filter the output so it only generates an xmltv of the channels I want, as opposed to all channels? I'd love the ability to mark favorites on the zap2it website then output those to an xmltv file.

It looks like the printGrid API has the ability to filter output with the user token and filter name, but I can't work out the parameters for a filtered output with the grid API. At worst, the XML generation could only output channel IDs listed in the ini file, but it would still have to scrape the entire guide (inefficient).

I can make a pull request for the latter but wanted to list it here incase there's something I'm overlooking.

Thanks!

LineupID missing?

Am I missing something on the zap2it page? I don't see headendid or lineupid in the URL except referring to themselves. Nothing that looks unique. I'm in TV Listings, press the Print button, and I see lineupid=lineupid. Does that suffice for the script?

tvheadend epg grabber

Heads-up, I asked the tvheadend developpers for pointers (or help) on how you code could be migrated to become a new default epg grabber in tvheadend. That allows to use the built-in functionalities within tvheadend to use methods of capturing EPG (using OTA scanning, flat-file, url download, etc.).

FYI:
https://tvheadend.org/issues/6039

another error help.

sorry idk what happen this time . i don't touch nothing.
ini file its fine
i run it again and give me this error :

03-15-2022 (013344)

i did with this
python3 zap2it-GuideScrape.py -c zap2itconfig.ini -o xmlfile.xml

and .ini file all configuration its fine.

2022-03-15_01-44-17

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.