Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Invalid cached data led to exceptions on read #472

Open
JamesRTaylor opened this issue Nov 30, 2020 · 4 comments
Open

Invalid cached data led to exceptions on read #472

JamesRTaylor opened this issue Nov 30, 2020 · 4 comments

Comments

@JamesRTaylor
Copy link
Contributor

Invalid cached data led to java.lang.ArrayIndexOutOfBoundsException and io.airlift.compress.MalformedInputException exceptions on read until Presto worker restarted. I don't have a reproducer, but I wanted to let you know that we saw this occurring. We're on 0.3.19, Presto 338, and using m5d.12xlarge worker nodes in AWS.

@JamesRTaylor
Copy link
Contributor Author

FYI, @shubhamtagra - we continue to run into this issue. It occurs when the cluster is overloaded. Essentially invalid files are loaded into the cache then every time a read occurs for these, the query fails.

@JamesRTaylor
Copy link
Contributor Author

Verified that the issue occurs with both read-through and async modes. Unfortunately I had to disable caching due to this issue.

@raunaqmorarka
Copy link
Member

Could you add a stack trace for the exception that you saw ?

@JamesRTaylor
Copy link
Contributor Author

io.prestosql.spi.PrestoException: Error opening Hive split s3://lyftqubole-iad/qubole/t/yrotsi/PRODUCTION/stage/subscriptions_history_7cdfa8/ds=2020-12-01/20201202_140043_07242_5a5n7_adae9373-a180-4d2e-9ae8-57f3f71f7e44 (offset=0, length=67571396): can not read class org.apache.parquet.format.FileMetaData: don't know what type: 15
	at io.prestosql.plugin.hive.parquet.ParquetPageSourceFactory.createPageSource(ParquetPageSourceFactory.java:242)
	at io.prestosql.plugin.hive.parquet.ParquetPageSourceFactory.createPageSource(ParquetPageSourceFactory.java:132)
	at io.prestosql.plugin.hive.HivePageSourceProvider.createHivePageSource(HivePageSourceProvider.java:178)
	at io.prestosql.plugin.hive.HivePageSourceProvider.createPageSource(HivePageSourceProvider.java:105)
	at io.prestosql.plugin.base.classloader.ClassLoaderSafeConnectorPageSourceProvider.createPageSource(ClassLoaderSafeConnectorPageSourceProvider.java:57)
	at io.prestosql.split.PageSourceManager.createPageSource(PageSourceManager.java:64)
	at io.prestosql.operator.ScanFilterAndProjectOperator$SplitToPages.process(ScanFilterAndProjectOperator.java:254)
	at io.prestosql.operator.ScanFilterAndProjectOperator$SplitToPages.process(ScanFilterAndProjectOperator.java:182)
	at io.prestosql.operator.WorkProcessorUtils$3.process(WorkProcessorUtils.java:319)
	at io.prestosql.operator.WorkProcessorUtils$ProcessWorkProcessor.process(WorkProcessorUtils.java:372)
	at io.prestosql.operator.WorkProcessorUtils$3.process(WorkProcessorUtils.java:306)
	at io.prestosql.operator.WorkProcessorUtils$ProcessWorkProcessor.process(WorkProcessorUtils.java:372)
	at io.prestosql.operator.WorkProcessorUtils$3.process(WorkProcessorUtils.java:306)
	at io.prestosql.operator.WorkProcessorUtils$ProcessWorkProcessor.process(WorkProcessorUtils.java:372)
	at io.prestosql.operator.WorkProcessorUtils.getNextState(WorkProcessorUtils.java:221)
	at io.prestosql.operator.WorkProcessorUtils.lambda$processStateMonitor$2(WorkProcessorUtils.java:200)
	at io.prestosql.operator.WorkProcessorUtils$ProcessWorkProcessor.process(WorkProcessorUtils.java:372)
	at io.prestosql.operator.WorkProcessorUtils.getNextState(WorkProcessorUtils.java:221)
	at io.prestosql.operator.WorkProcessorUtils.lambda$finishWhen$3(WorkProcessorUtils.java:215)
	at io.prestosql.operator.WorkProcessorUtils$ProcessWorkProcessor.process(WorkProcessorUtils.java:372)
	at io.prestosql.operator.WorkProcessorSourceOperatorAdapter.getOutput(WorkProcessorSourceOperatorAdapter.java:149)
	at io.prestosql.operator.Driver.processInternal(Driver.java:379)
	at io.prestosql.operator.Driver.lambda$processFor$8(Driver.java:283)
	at io.prestosql.operator.Driver.tryWithLock(Driver.java:675)
	at io.prestosql.operator.Driver.processFor(Driver.java:276)
	at io.prestosql.execution.SqlTaskExecution$DriverSplitRunner.processFor(SqlTaskExecution.java:1075)
	at io.prestosql.execution.executor.PrioritizedSplitRunner.process(PrioritizedSplitRunner.java:163)
	at io.prestosql.execution.executor.TaskExecutor$TaskRunner.run(TaskExecutor.java:484)
	at io.prestosql.$gen.Presto_____20201203_154652_2.run(Unknown Source)
	at java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1128)
	at java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:628)
	at java.base/java.lang.Thread.run(Thread.java:834)
Caused by: java.io.IOException: can not read class org.apache.parquet.format.FileMetaData: don't know what type: 15
	at org.apache.parquet.format.Util.read(Util.java:225)
	at org.apache.parquet.format.Util.readFileMetaData(Util.java:86)
	at io.prestosql.parquet.reader.MetadataReader.readFooter(MetadataReader.java:109)
	at io.prestosql.plugin.hive.parquet.ParquetPageSourceFactory.createPageSource(ParquetPageSourceFactory.java:176)
	... 31 more
Caused by: io.prestosql.hive.$internal.parquet.org.apache.thrift.protocol.TProtocolException: don't know what type: 15
	at io.prestosql.hive.$internal.parquet.org.apache.thrift.protocol.TCompactProtocol.getTType(TCompactProtocol.java:898)
	at io.prestosql.hive.$internal.parquet.org.apache.thrift.protocol.TCompactProtocol.readFieldBegin(TCompactProtocol.java:562)
	at org.apache.parquet.format.InterningProtocol.readFieldBegin(InterningProtocol.java:155)
	at org.apache.parquet.format.ColumnMetaData$ColumnMetaDataStandardScheme.read(ColumnMetaData.java:1630)
	at org.apache.parquet.format.ColumnMetaData$ColumnMetaDataStandardScheme.read(ColumnMetaData.java:1623)
	at org.apache.parquet.format.ColumnMetaData.read(ColumnMetaData.java:1464)
	at org.apache.parquet.format.ColumnChunk$ColumnChunkStandardScheme.read(ColumnChunk.java:1101)
	at org.apache.parquet.format.ColumnChunk$ColumnChunkStandardScheme.read(ColumnChunk.java:1070)
	at org.apache.parquet.format.ColumnChunk.read(ColumnChunk.java:954)
	at org.apache.parquet.format.RowGroup$RowGroupStandardScheme.read(RowGroup.java:932)
	at org.apache.parquet.format.RowGroup$RowGroupStandardScheme.read(RowGroup.java:911)
	at org.apache.parquet.format.RowGroup.read(RowGroup.java:818)
	at org.apache.parquet.format.FileMetaData$FileMetaDataStandardScheme.read(FileMetaData.java:1298)
	at org.apache.parquet.format.FileMetaData$FileMetaDataStandardScheme.read(FileMetaData.java:1242)
	at org.apache.parquet.format.FileMetaData.read(FileMetaData.java:1116)
	at org.apache.parquet.format.Util.read(Util.java:222)
	... 34 more

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants