Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Parsing feed fails if it has html encoded characters #204

Open
shtolik opened this issue Aug 6, 2024 · 2 comments
Open

Parsing feed fails if it has html encoded characters #204

shtolik opened this issue Aug 6, 2024 · 2 comments

Comments

@shtolik
Copy link

shtolik commented Aug 6, 2024

Describe the bug
I tried to parse the feed https://myrskyla.fi/feed/ but it contains in a title tag Ä instead of Ä which then leads to exceptions and failing to parse feed both on android and ios side.
android:

RssParsingException(message=Something went wrong during the parsing of the feed. Please check if the XML is valid, cause=org.xmlpull.v1.XmlPullParserException: unresolved: ä (position:TEXT @11:22 in java.io.InputStreamReader@4290534) )
at com.prof18.rssparser.internal.AndroidXmlParser$parseXML$2.invokeSuspend(AndroidXmlParser.kt:67)
at kotlin.coroutines.jvm.internal.BaseContinuationImpl.resumeWith(ContinuationImpl.kt:33)
at kotlinx.coroutines.DispatchedTask.run(DispatchedTask.kt:104)
at kotlinx.coroutines.internal.LimitedDispatcher$Worker.run(LimitedDispatcher.kt:111)
at kotlinx.coroutines.scheduling.TaskImpl.run(Tasks.kt:99)
at kotlinx.coroutines.scheduling.CoroutineScheduler.runSafely(CoroutineScheduler.kt:585)
at kotlinx.coroutines.scheduling.CoroutineScheduler$Worker.executeTask(CoroutineScheduler.kt:802)
at kotlinx.coroutines.scheduling.CoroutineScheduler$Worker.runWorker(CoroutineScheduler.kt:706)
at kotlinx.coroutines.scheduling.CoroutineScheduler$Worker.run(CoroutineScheduler.kt:693)
Caused by: org.xmlpull.v1.XmlPullParserException: unresolved: ä (position:TEXT @11:22 in java.io.InputStreamReader@4290534)
at com.android.org.kxml2.io.KXmlParser.checkRelaxed(KXmlParser.java:305)
at com.android.org.kxml2.io.KXmlParser.readEntity(KXmlParser.java:1285)
at com.android.org.kxml2.io.KXmlParser.readValue(KXmlParser.java:1402)
at com.android.org.kxml2.io.KXmlParser.next(KXmlParser.java:393)
at com.android.org.kxml2.io.KXmlParser.next(KXmlParser.java:313)
at com.android.org.kxml2.io.KXmlParser.nextText(KXmlParser.java:2077)
at com.prof18.rssparser.internal.XmlPullParser_Kt.nextTrimmedText(XmlPullParser+.kt:5)
at com.prof18.rssparser.internal.rss.RssParserKt.extractRSSContent(RssParser.kt:289)
at com.prof18.rssparser.internal.AndroidXmlParser$parseXML$2.invokeSuspend(AndroidXmlParser.kt:54)
at kotlin.coroutines.jvm.internal.BaseContinuationImpl.resumeWith(ContinuationImpl.kt:33) 
at kotlinx.coroutines.DispatchedTask.run(DispatchedTask.kt:104) 
at kotlinx.coroutines.internal.LimitedDispatcher$Worker.run(LimitedDispatcher.kt:111) 
at kotlinx.coroutines.scheduling.TaskImpl.run(Tasks.kt:99) 
at kotlinx.coroutines.scheduling.CoroutineScheduler.runSafely(CoroutineScheduler.kt:585) 
at kotlinx.coroutines.scheduling.CoroutineScheduler$Worker.executeTask(CoroutineScheduler.kt:802) 
at kotlinx.coroutines.scheduling.CoroutineScheduler$Worker.runWorker(CoroutineScheduler.kt:706) 
at kotlinx.coroutines.scheduling.CoroutineScheduler$Worker.run(CoroutineScheduler.kt:693) 

ios:

0   composeui                           0x10e50c5d7        kfun:kotlin.Throwable#<init>(){} + 95 (/opt/buildAgent/work/b2e1db4d8d903ca4/kotlin/kotlin-native/runtime/src/main/kotlin/kotlin/Throwable.kt:32:28)
1   composeui                           0x10e50589f        kfun:kotlin.Exception#<init>(){} + 87 (/opt/buildAgent/work/b2e1db4d8d903ca4/kotlin/kotlin-native/runtime/src/main/kotlin/kotlin/Exceptions.kt:21:35)
2   composeui                           0x110063c33        kfun:com.prof18.rssparser.exception.RssParsingException#<init>(kotlin.String?;kotlin.Throwable?){} + 107 (/Users/runner/work/RSS-Parser/RSS-Parser/rssparser/src/commonMain/kotlin/com/prof18/rssparser/exception/RssParsingException.kt:12:5)
3   composeui                           0x11008ed37        kfun:com.prof18.rssparser.internal.IosXmlParser.parseXML$lambda$3$lambda$1#internal + 299 (/Users/runner/work/RSS-Parser/RSS-Parser/rssparser/src/iosMain/kotlin/com/prof18/rssparser/internal/IosXmlParser.kt:32:33)
4   composeui                           0x11008fc37        kfun:com.prof18.rssparser.internal.IosXmlParser.$parseXML$lambda$3$lambda$1$FUNCTION_REFERENCE$2.invoke#internal + 103 (/Users/runner/work/RSS-Parser/RSS-Parser/rssparser/src/iosMain/kotlin/com/prof18/rssparser/internal/IosXmlParser.kt:26:13)

The link of the RSS Feed
https://myrskyla.fi/feed/

I was able to fix it by replacing this (and some more likely offending chars http://www.javascripter.net/faq/accentedcharacters.htm) manually:

val feedString = xmlFetcher.fetchXmlAsString(url)
val feedStringFixed = feedString
            .replace("& auml;", "Ä")
            .replace("& Ouml;", "Ö")
val channel = parser.parse(feedStringFixed)

But i needed to fetch the feed myself because built-in XmlFetcher is internal class.
So would be good to

  1. try unescaping chars if parsing fails or/and making XmlFetcher interface accessible
  2. add possibility to override or use XmlFetcher.
@shtolik shtolik added the bug label Aug 6, 2024
@kbios
Copy link

kbios commented Oct 2, 2024

This also affects RSS feeds which fail to escape the ampersand when it's used in the text, like the arstechnica one (as of now): https://feeds.arstechnica.com/arstechnica/index

(Attached below for posterity)
arstechnica.txt

@prof18
Copy link
Owner

prof18 commented Oct 7, 2024

Thanks for reporting this issue. The "right" way would be to have the feed owner add the proper CDATA escape.

I've done some research and there's no "smart" way to fix that.

I'll consider adding some settings in the builder to allow replacing some strings, but for now, the suggested way is manually fetching the feed as a string and parsing it with the parse method.

@prof18 prof18 added feature request and removed bug labels Oct 7, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

3 participants