Skip to content

bug: NPE when processing ByteArrayResource #1571

Closed
@ogbozoyan

Description

@ogbozoyan

I've also created proposal how to fix it #1572
Bug description
If processing Resource doesn't have getFileName() at org.springframework.ai.transformer.splitter.TextSplitter#createDocuments will throw NPE

Environment
Please provide as many details as possible: springAiVersion = "1.0.0-M3", Java 21, PG vector store

PgVectorStore compose file
Also i have init sql

Steps to reproduce
I've endpoint CronController which produce event to PgEventListener
in listener asynchronously calls PGVectorStoreService i've wrote wrapper method for each document enrich with filename, if i pass documents to org.springframework.ai.transformer.splitter.TextSplitter#createDocuments
without filenames in metadata part, when collector try to get e.getValue() will throw unclear NPE

Map<String, Object> metadataCopy = metadata.entrySet()
					.stream()
					.collect(Collectors.toMap(e -> e.getKey(), e -> e.getValue()));

Expected behavior
Safely Collectors.toMap calling

Minimal Complete Reproducible example

20-10-2024 23:06:50.018  -  INFO 52440 [Async-1]  r.o.cron.service.pg.PgEventListener:16  : Processing event: PgEvent(resource=Byte array resource [resource loaded from byte array], type=PDF, fileName=Cloud_Architecture_Demystified_Understand_how_to_design_sustainable.pdf)
20-10-2024 23:06:50.019  -  INFO 52440 [Async-1]  r.o.cron.service.pg.PgEventListener:19  : PDF processing event: PgEvent(resource=Byte array resource [resource loaded from byte array], type=PDF, fileName=Cloud_Architecture_Demystified_Understand_how_to_design_sustainable.pdf)
20-10-2024 23:06:50.099  -  INFO 52440 [Async-1]  r.o.c.service.pg.PGVectorStoreService:83  : Loading Cloud_Architecture_Demystified_Understand_how_to_design_sustainable.pdf Reference PDF into Vector Store
20-10-2024 23:06:50.232  -  INFO 52440 [Async-1]  o.s.ai.reader.pdf.PagePdfDocumentReader:114  : Processing PDF page: 1
20-10-2024 23:06:50.429  -  INFO 52440 [Async-1]  o.s.ai.reader.pdf.PagePdfDocumentReader:114  : Processing PDF page: 23
20-10-2024 23:06:50.520  -  INFO 52440 [Async-1]  o.s.ai.reader.pdf.PagePdfDocumentReader:114  : Processing PDF page: 45
20-10-2024 23:06:50.597  -  INFO 52440 [Async-1]  o.s.ai.reader.pdf.PagePdfDocumentReader:114  : Processing PDF page: 67
20-10-2024 23:07:16.583  -  INFO 52440 [Async-1]  o.s.ai.reader.pdf.PagePdfDocumentReader:114  : Processing PDF page: 89
20-10-2024 23:07:16.657  -  INFO 52440 [Async-1]  o.s.ai.reader.pdf.PagePdfDocumentReader:114  : Processing PDF page: 111
20-10-2024 23:07:16.749  -  INFO 52440 [Async-1]  o.s.ai.reader.pdf.PagePdfDocumentReader:114  : Processing PDF page: 133
20-10-2024 23:07:16.825  -  INFO 52440 [Async-1]  o.s.ai.reader.pdf.PagePdfDocumentReader:114  : Processing PDF page: 155
20-10-2024 23:07:16.899  -  INFO 52440 [Async-1]  o.s.ai.reader.pdf.PagePdfDocumentReader:114  : Processing PDF page: 177
20-10-2024 23:07:16.981  -  INFO 52440 [Async-1]  o.s.ai.reader.pdf.PagePdfDocumentReader:114  : Processing PDF page: 199
20-10-2024 23:07:17.087  -  INFO 52440 [Async-1]  o.s.ai.reader.pdf.PagePdfDocumentReader:156  : Processing 228 pages
20-10-2024 23:07:17.097  - ERROR 52440 [Async-1]  r.o.c.service.pg.PGVectorStoreService:95  : Error while loading PDF Cloud_Architecture_Demystified_Understand_how_to_design_sustainable.pdf into Vector Store. Exception: NullPointerException - Message: null
20-10-2024 23:07:17.098  - ERROR 52440 [Async-1]  o.s.a.i.SimpleAsyncUncaughtExceptionHandler:39  : Unexpected exception occurred invoking async method: public void ru.ogbozoyan.cron.service.pg.PgEventListener.process(ru.ogbozoyan.cron.service.pg.PgEvent)

java.lang.NullPointerException: null
	at java.base/java.util.Objects.requireNonNull(Objects.java:233) ~[na:na]
	at java.base/java.util.stream.Collectors.lambda$uniqKeysMapAccumulator$1(Collectors.java:180) ~[na:na]
	at java.base/java.util.stream.ReduceOps$3ReducingSink.accept(ReduceOps.java:169) ~[na:na]
	at java.base/java.util.HashMap$EntrySpliterator.forEachRemaining(HashMap.java:1858) ~[na:na]
	at java.base/java.util.stream.AbstractPipeline.copyInto(AbstractPipeline.java:509) ~[na:na]
	at java.base/java.util.stream.AbstractPipeline.wrapAndCopyInto(AbstractPipeline.java:499) ~[na:na]
	at java.base/java.util.stream.ReduceOps$ReduceOp.evaluateSequential(ReduceOps.java:921) ~[na:na]
	at java.base/java.util.stream.AbstractPipeline.evaluate(AbstractPipeline.java:234) ~[na:na]
	at java.base/java.util.stream.ReferencePipeline.collect(ReferencePipeline.java:682) ~[na:na]
	at org.springframework.ai.transformer.splitter.TextSplitter.createDocuments(TextSplitter.java:91) ~[spring-ai-core-1.0.0-M3.jar:1.0.0-M3]
	at org.springframework.ai.transformer.splitter.TextSplitter.doSplitDocuments(TextSplitter.java:71) ~[spring-ai-core-1.0.0-M3.jar:1.0.0-M3]
	at org.springframework.ai.transformer.splitter.TextSplitter.apply(TextSplitter.java:41) ~[spring-ai-core-1.0.0-M3.jar:1.0.0-M3]
	at ru.ogbozoyan.cron.service.pg.PGVectorStoreService.saveNewPDFAsync(PGVectorStoreService.kt:91) ~[main/:na]
	at ru.ogbozoyan.cron.service.pg.PgEventListener.process(PgEventListener.kt:20) ~[main/:na]
	at java.base/jdk.internal.reflect.DirectMethodHandleAccessor.invoke(DirectMethodHandleAccessor.java:103) ~[na:na]
	at java.base/java.lang.reflect.Method.invoke(Method.java:580) ~[na:na]

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions