Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Exception caught! undefined entity: line 45, column 38 #32

Open
macdis opened this issue Sep 8, 2013 · 15 comments
Open

Exception caught! undefined entity: line 45, column 38 #32

macdis opened this issue Sep 8, 2013 · 15 comments

Comments

@macdis
Copy link

macdis commented Sep 8, 2013

Hi,

So when running the script I get an exception and the script exits without creating gamelist.xml. I can't figure out what the issue is as the gamesdb lookup does return valid xml.

ES-scraper, a scraper for EmulationStation
Scanning folder..(/home/pi/RetroPie/roms/mame)
Trying to identify xxxxxx.zip..
Exception caught! undefined entity: line 45, column 38
No new games added.
All done!

(Obviously, what I am calling "xxxxxx.zip" is a valid ROM and doing a manual gamesdb lookup on it returns valid xml.)

Any ideas? Thanks!

@gr0ebi
Copy link

gr0ebi commented Sep 8, 2013

Hi there,

got exaclty same problem, look here:

ES-scraper, a scraper for EmulationStation
Boxart downloading disabled.
Re-scraping all games..
Verbose mode enabled.
Scanning folder..(/home/pi/RetroPie/roms/atari800)
No new games added.
Scanning folder..(/home/pi/RetroPie/roms/atari2600)
No new games added.
Scanning folder..(/home/pi/RetroPie/roms/c64)
No new games added.
Scanning folder..(/home/pi/RetroPie/roms/doom)
Trying to identify prboom.wad..
Exception caught! undefined entity: line 45, column 38
No new games added.
Scanning folder..(/home/pi/RetroPie/roms/duke3d)
Trying to identify duke3d.grp..
Exception caught! undefined entity: line 45, column 38
No new games added.
Scanning folder..(/home/pi/RetroPie/roms/gb)
No new games added.
Scanning folder..(/home/pi/RetroPie/roms/gba)
No new games added.
Scanning folder..(/home/pi/RetroPie/roms/gbc)
No new games added.
Scanning folder..(/home/pi/RetroPie/roms/gamegear)
No new games added.
Scanning folder..(/home/pi/RetroPie/roms/intellivision)
No new games added.
Scanning folder..(/home/pi/RetroPie/roms/mame)
No new games added.
Scanning folder..(/home/pi/RetroPie/roms/fba)
No new games added.
Scanning folder..(/home/pi/RetroPie/roms/x86)
Gamelist already exists: /home/pi/RetroPie/roms/x86/gamelist.xml
Trying to identify Start.txt..
Exception caught! undefined entity: line 45, column 38
No new games added.
Scanning folder..(/home/pi/RetroPie/roms/scummvm)
No new games added.
Scanning folder..(/home/pi/RetroPie/roms/mastersystem)
No new games added.
Scanning folder..(/home/pi/RetroPie/roms/megadrive)
No new games added.
Scanning folder..(/home/pi/RetroPie/roms/neogeo)
No new games added.
Scanning folder..(/home/pi/RetroPie/roms/nes)
No new games added.
Scanning folder..(/home/pi/RetroPie/roms/pcengine)
No new games added.
Scanning folder..(/home/pi/RetroPie/roms/psx)
No new games added.
Scanning folder..(/home/pi/RetroPie/roms/psp)
No new games added.
Scanning folder..(/home/pi/samba/roms/snes)
Trying to identify xxxxxxxxxx.smc..
Exception caught! undefined entity: line 45, column 38
No new games added.
All done!

@elpendor
Copy link
Owner

elpendor commented Sep 8, 2013

I don't have my enviroment set up but I'll try to do it quickly and have a look at it.

What's the actual filename?

@gr0ebi
Copy link

gr0ebi commented Sep 8, 2013

"Super_Bomberman_2.smc" is my filename. But this error also appears on "pre-installed" roms:

Scanning folder..(/home/pi/RetroPie/roms/duke3d)
Trying to identify duke3d.grp..
Exception caught! undefined entity: line 45, column 38

@gr0ebi
Copy link

gr0ebi commented Sep 8, 2013

sudo apt-get update && sudo apt-get upgrade
sudo reboot

this did my job! Working now!

@macdis
Copy link
Author

macdis commented Sep 8, 2013

Updating and upgrading did not work for me. Same error as before. Using Retropie 1.7 on a 512 MB model B.

@gr0ebi
Copy link

gr0ebi commented Sep 8, 2013

Very strange, I added some other roms and got same error like before :S

@elpendor
Copy link
Owner

elpendor commented Sep 8, 2013

I would advise you to revert to a previous commit for now. The plan has been (for a while) to port it to c++ and integrate it with ES so.. we'll see what happens.

@macdis
Copy link
Author

macdis commented Sep 9, 2013

I tried previous versions back to May or so. None work.

@gr0ebi
Copy link

gr0ebi commented Sep 9, 2013

I got many problems since kernel version 3.6.11+, probably this Errors also result from this kernel version

uname -a
Linux raspberrypi 3.6.11+ #538 PREEMPT Fri Aug 30 20:42:08 BST 2013 armv6l GNU/Linux
I'm also using RPi 512MB Model B

@dVerge
Copy link

dVerge commented Sep 12, 2013

I just tried it with the CRC argument and it's working, but only grabbing a handful of games.

@sphanley
Copy link

Has anyone figured out a working solution to this issue? I'm experiencing the exact same issue except it's saying "line 46, column 38". I'm using the preinstalled version of the script in the RetroPie image v. 1.8.1. I'd love to get the script working.

@gr0ebi
Copy link

gr0ebi commented Sep 26, 2013

did u use the last commit? I haven't used my pi for weeks, but i do remember that it was working at last.

@sphanley
Copy link

No, I'll try updating to the latest commit when I return home today. Thanks for the advice.

@phexe
Copy link
Contributor

phexe commented Oct 24, 2013

If you modify the Python script, you just need to change the location of a TRY: and reindent it.

#!/usr/bin/env python
import os, imghdr, urllib, urllib2, sys, Image, argparse, zlib, unicodedata, re
from xml.etree import ElementTree as ET
from xml.etree.ElementTree import Element, SubElement
parser = argparse.ArgumentParser(description='ES-scraper, a scraper for EmulationStation')
parser.add_argument("-w", metavar="value", help="defines a maximum width (in pixels) for boxarts (anything above that will be resized to that value)", type=int)
parser.add_argument("-noimg", help="disables boxart downloading", action='store_true')
parser.add_argument("-v", help="verbose output", action='store_true')
parser.add_argument("-f", help="force re-scraping (ignores and overwrites the current gamelist)", action='store_true')
parser.add_argument("-crc", help="CRC scraping", action='store_true')
parser.add_argument("-p", help="partial scraping (per console)", action='store_true')
parser.add_argument("-m", help="manual mode (choose from multiple results)", action='store_true')
parser.add_argument('-newpath', help="gamelist & boxart are written in $HOME/.emulationstation/%%NAME%%/", action='store_true')
parser.add_argument('-fix', help="temporary thegamesdb missing platform fix", action='store_true')
args = parser.parse_args()
def normalize(s):
   return ''.join((c for c in unicodedata.normalize('NFKD', unicode(s)) if unicodedata.category(c) != 'Mn'))
def fixExtension(file):    
    newfile="%s.%s" % (os.path.splitext(file)[0],imghdr.what(file))
    os.rename(file, newfile)
    return newfile
def readConfig(file):
    lines=config.read().splitlines()
    systems=[]
    for line in lines:
        if not line.strip() or line[0]=='#':
            continue
        else:
            if "NAME=" in line:
                name=line.split('=')[1]
            if "PATH=" in line:
                path=line.split('=')[1]
            elif "EXTENSION" in line:
                ext=line.split('=')[1]
            elif "PLATFORMID" in line:
                pid=line.split('=')[1]
                if not pid:
                    continue
                else:
                    system=(name,path,ext,pid)
                    systems.append(system)
    config.close()
    return systems
def crc(fileName):
    prev = 0
    for eachLine in open(fileName,"rb"):
        prev = zlib.crc32(eachLine, prev)
    return "%X"%(prev & 0xFFFFFFFF)
def indent(elem, level=0):
    i = "\n" + level*"  "
    if len(elem):
        if not elem.text or not elem.text.strip():
            elem.text = i + "  "
        if not elem.tail or not elem.tail.strip():
            elem.tail = i
        for elem in elem:
            indent(elem, level+1)
        if not elem.tail or not elem.tail.strip():
            elem.tail = i
    else:
        if level and (not elem.tail or not elem.tail.strip()):
            elem.tail = i
def getPlatformName(id):
    url = "http://thegamesdb.net/api/GetPlatform.php"
    req = urllib2.Request(url, urllib.urlencode({'id':id}), headers={'User-Agent' : "RetroPie Scraper Browser"})
    data = urllib2.urlopen( req )
    platform_data = ET.parse(data)
    return platform_data.find('Platform/Platform').text
def exportList(gamelist):
    if gamelistExists and args.f is False:
        for game in gamelist.iter("game"):
            existinglist.getroot().append(game)
        indent(existinglist.getroot())
        ET.ElementTree(existinglist.getroot()).write("gamelist.xml")
        print "Done! %s updated." % os.getcwd()+"/gamelist.xml"
    else:
        indent(gamelist)
        ET.ElementTree(gamelist).write("gamelist.xml")
        print "Done! List saved on %s" % os.getcwd()+"/gamelist.xml"
def getFiles(base):
    dict=set([])
    for files in sorted(os.listdir(base)):
        if files.endswith(tuple(ES_systems[var][2].split(' '))):
            filepath=os.path.abspath(os.path.join(base, files))
            dict.add(filepath)
    return dict
def getGameInfo(file,platformID):
    title=re.sub(r'\[.*?\]|\(.*?\)', '', os.path.splitext(os.path.basename(file))[0]).strip()
    if args.crc:
        crcvalue=crc(file)
        if args.v:
            try:
                print "CRC for %s: %s" % (os.path.basename(file), crcvalue)
            except zlib.error as e:
                print e.strerror
        URL = "http://api.archive.vg/2.0/Game.getInfoByCRC/xml/7TTRM4MNTIKR2NNAGASURHJOZJ3QXQC5/%s" % crcvalue
        values={}
    else:
        URL = "http://thegamesdb.net/api/GetGame.php"
        platform = getPlatformName(platformID)
        if platform == "Arcade": title = getRealArcadeTitle(title)            
        
        if args.fix:
            try:                
                fixreq = urllib2.Request("http://thegamesdb.net/api/GetGamesList.php", urllib.urlencode({'name' : title, 'platform' : platform}), headers={'User-Agent' : "RetroPie Scraper Browser"})
                fixdata=ET.parse(urllib2.urlopen(fixreq)).getroot()
                if fixdata.find("Game") is not None:            
                    values={ 'id': fixdata.findall("Game/id")[chooseResult(fixdata)].text if args.m else fixdata.find("Game/id").text }
            except:
                return None
        else:
            values={'name':title,'platform':platform}
    try:
        req = urllib2.Request(URL,urllib.urlencode(values), headers={'User-Agent' : "RetroPie Scraper Browser"})
        remotedata = urllib2.urlopen( req )
        data=ET.parse(remotedata).getroot()
    except ET.ParseError:
        print "Malformed XML found, skipping game.. (source: {%s})" % URL
        return None
    try:
        if args.crc:
            result = data.find("games/game")
            if result is not None and result.find("title").text is not None:
                return result
        elif data.find("Game") is not None:
            return data.findall("Game")[chooseResult(data)] if args.m else data.find("Game")
        else:
            return None
    except Exception, err:
        print "Skipping game..(%s)" % str(err)
        return None
def getText(node):
    return normalize(node.text) if node is not None else None
def getTitle(nodes):
    if args.crc:
        return getText(nodes.find("title"))
    else:
        return getText(nodes.find("GameTitle"))
def getGamePlatform(nodes):
    if args.crc:
        return getText(nodes.find("system_title"))
    else:
        return getText(nodes.find("Platform"))
def getRealArcadeTitle(title):
    print "Fetching real title for %s from mamedb.com" % title
    URL  = "http://www.mamedb.com/game/%s" % title
    data = "".join(urllib2.urlopen(URL).readlines())
    m    = re.search('Name:.*(.+) .*Year', data)
    if m:
       print "Found real title %s for %s on mamedb.com" % (m.group(1), title)
       return m.group(1)
    else:
       print "No title found for %s on mamedb.com" % title
       return title
def getDescription(nodes):
    if args.crc:
        return getText(nodes.find("description"))
    else:
        return getText(nodes.find("Overview"))
def getImage(nodes):
    if args.crc:
        return getText(nodes.find("box_front"))
    else:
        return getText(nodes.find("Images/boxart[@side='front']"))
def getTGDBImgBase(nodes):
    return nodes.find("baseImgUrl").text
def getRelDate(nodes):
    if args.crc:
        return None
    else:
        return getText(nodes.find("ReleaseDate"))
def getPublisher(nodes):
    if args.crc:
        return None
    else:
        return getText(nodes.find("Publisher"))
def getDeveloper(nodes):
    if args.crc:
        return getText(nodes.find("developer"))
    else:
        return getText(nodes.find("Developer"))
def getGenres(nodes):
    genres=[]
    if args.crc and nodes.find("genre") is not None:
        for item in getText(nodes.find("genre")).split('>'):
            genres.append(item)
    elif nodes.find("Genres") is not None:
        for item in nodes.find("Genres").iter("genre"):
            genres.append(item.text)
    return genres if len(genres)>0 else None
def resizeImage(img,output):
    maxWidth= args.w
    if (img.size[0]>maxWidth):
        print "Boxart over %spx. Resizing boxart.." % maxWidth
        height = int((float(img.size[1])*float(maxWidth/float(img.size[0]))))
        img.resize((maxWidth,height), Image.ANTIALIAS).save(output)
def downloadBoxart(path,output):
    if args.crc:
        os.system("wget -q %s --output-document=\"%s\"" % (path,output))
    else:
        os.system("wget -q http://thegamesdb.net/banners/%s --output-document=\"%s\"" % (path,output))
def skipGame(list, filepath):
    for game in list.iter("game"):
        if game.findtext("path")==filepath:
            if args.v:
                print "Game \"%s\" already in gamelist. Skipping.." % os.path.basename(filepath)
            return True
def chooseResult(nodes):
    results=nodes.findall('Game')
    if len(results) > 1:
        for i,v in enumerate(results):
            try:
                print "[%s] %s | %s" % (i,getTitle(v), getGamePlatform(v))
            except Exception as e:
                print "Exception! %s %s %s" % (e, getTitle(v), getGamePlatform(v))
        return int(raw_input("Select a result (or press Enter to skip): "))
    else:
        return 0
def scanFiles(SystemInfo):
    name=SystemInfo[0]
    folderRoms=SystemInfo[1]
    extension=SystemInfo[2]
    platformID=SystemInfo[3]
    global gamelistExists
    global existinglist
    gamelistExists = False
    gamelist = Element('gameList')
    folderRoms = os.path.expanduser(folderRoms)
    if args.newpath is False:
        destinationFolder = folderRoms;
    else:
        destinationFolder = os.environ['HOME']+"/.emulationstation/%s/" % name
    try:
        os.chdir(destinationFolder)
    except OSError as e:
        print "%s : %s" % (destinationFolder, e.strerror)
        return
    print "Scanning folder..(%s)" % folderRoms
    if os.path.exists("gamelist.xml"):
        try:
            existinglist=ET.parse("gamelist.xml")
            gamelistExists=True
            if args.v:
                print "Gamelist already exists: %s" % os.path.abspath("gamelist.xml")
        except:
            gamelistExists=False
            print "There was an error parsing the list or file is empty"
    for root, dirs, allfiles in os.walk(folderRoms, followlinks=True):
        allfiles.sort()
        for files in allfiles:
            if files.endswith(tuple(extension.split(' '))):
                try:
                    filepath=os.path.abspath(os.path.join(root, files))
                    filename = os.path.splitext(files)[0]
    
                    if gamelistExists and not args.f:
                        if skipGame(existinglist,filepath):
                            continue
    
                    print "Trying to identify %s.." % files
    
                    data=getGameInfo(filepath, platformID)
     
                    if data is None:
                        continue
                    else:
                        result=data
    
                    str_title=getTitle(result)
                    str_des=getDescription(result)
                    str_img=getImage(result)
                    str_rd=getRelDate(result)
                    str_pub=getPublisher(result)
                    str_dev=getDeveloper(result)
                    lst_genres=getGenres(result)
    
                    if str_title is not None:
                        game = SubElement(gamelist, 'game')
                        path = SubElement(game, 'path')
                        name = SubElement(game, 'name')
                        desc = SubElement(game, 'desc')
                        image = SubElement(game, 'image')
                        releasedate = SubElement(game, 'releasedate')
                        publisher=SubElement(game, 'publisher')
                        developer=SubElement(game, 'developer')
                        genres=SubElement(game, 'genres')
    
                        path.text=filepath
                        name.text=str_title
                        print "Game Found: %s" % str_title
    
                    if str_des is not None:
                        desc.text=str_des
    
                    if str_img is not None and args.noimg is False:
                        if args.newpath is True:
                            imgpath="./" + filename+os.path.splitext(str_img)[1]
                        else:
                            imgpath=os.path.abspath(os.path.join(root, filename+os.path.splitext(str_img)[1]))
    
                        print "Downloading boxart.."
    
                        downloadBoxart(str_img,imgpath)
                        imgpath=fixExtension(imgpath)
                        image.text=imgpath
    
                        if args.w:
                            try:
                                resizeImage(Image.open(imgpath),imgpath)
                            except:
                                print "Image resize error"
    
                    if str_rd is not None:
                        releasedate.text=str_rd
    
                    if str_pub is not None:
                        publisher.text=str_pub
    
                    if str_dev is not None:
                        developer.text=str_dev
    
                    if lst_genres is not None:
                        for genre in lst_genres:
                            newgenre = SubElement(genres, 'genre')
                            newgenre.text=genre.strip()
                except KeyboardInterrupt:
                    print "Ctrl+C detected. Closing work now..."
                except Exception as e:
                    print "Exception caught! %s" % e
    if gamelist.find("game") is None:
        print "No new games added."
    else:
        print "{} games added.".format(len(gamelist))
        exportList(gamelist)
try:
    if os.getuid()==0:
        os.environ['HOME']="/home/"+os.getenv("SUDO_USER")
    config=open(os.environ['HOME']+"/.emulationstation/es_systems.cfg")
except IOError as e:
    sys.exit("Error when reading config file: %s \nExiting.." % e.strerror)
ES_systems=readConfig(config)
print parser.description
if args.w:
    print "Max width set: %spx." % str(args.w)
if args.noimg:
    print "Boxart downloading disabled."
if args.f:
    print "Re-scraping all games.."
if args.v:
    print "Verbose mode enabled."
if args.crc:
    print "CRC scraping enabled."
if args.p:
    print "Partial scraping enabled. Systems found:"
    for i,v in enumerate(ES_systems):
        print "[%s] %s" % (i,v[0])
    try:
        var = int(raw_input("System ID: "))
        scanFiles(ES_systems[var])
    except:
        sys.exit()
else:
    for i,v in enumerate(ES_systems):
        scanFiles(ES_systems[i])
print "All done!"

That code will catch exceptions on single entries, not entire systems :)

EDIT: moved "try:" from line 288, to 290 and fixed indenting below it.

@chugcup
Copy link
Contributor

chugcup commented Dec 14, 2013

I looked into it and the cause of the error was in the getPlatformName() method requesting from thegamesdb: the data coming back was an HTML error page which couldn't be parsed properly as XML. This was because the site changed their policy to require cookies (which urllib was not providing)

This supposedly was fixed in a recent commit: 080e61a so I'm guessing this issue should be closed

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

7 participants