Description
I've also created proposal how to fix it #1572
Bug description
If processing Resource doesn't have getFileName() at org.springframework.ai.transformer.splitter.TextSplitter#createDocuments will throw NPE
Environment
Please provide as many details as possible: springAiVersion = "1.0.0-M3", Java 21, PG vector store
PgVectorStore compose file
Also i have init sql
Steps to reproduce
I've endpoint CronController which produce event to PgEventListener
in listener asynchronously calls PGVectorStoreService i've wrote wrapper method for each document enrich with filename, if i pass documents to org.springframework.ai.transformer.splitter.TextSplitter#createDocuments
without filenames in metadata part, when collector try to get e.getValue()
will throw unclear NPE
Map<String, Object> metadataCopy = metadata.entrySet()
.stream()
.collect(Collectors.toMap(e -> e.getKey(), e -> e.getValue()));
Expected behavior
Safely Collectors.toMap calling
Minimal Complete Reproducible example
20-10-2024 23:06:50.018 - INFO 52440 [Async-1] r.o.cron.service.pg.PgEventListener:16 : Processing event: PgEvent(resource=Byte array resource [resource loaded from byte array], type=PDF, fileName=Cloud_Architecture_Demystified_Understand_how_to_design_sustainable.pdf)
20-10-2024 23:06:50.019 - INFO 52440 [Async-1] r.o.cron.service.pg.PgEventListener:19 : PDF processing event: PgEvent(resource=Byte array resource [resource loaded from byte array], type=PDF, fileName=Cloud_Architecture_Demystified_Understand_how_to_design_sustainable.pdf)
20-10-2024 23:06:50.099 - INFO 52440 [Async-1] r.o.c.service.pg.PGVectorStoreService:83 : Loading Cloud_Architecture_Demystified_Understand_how_to_design_sustainable.pdf Reference PDF into Vector Store
20-10-2024 23:06:50.232 - INFO 52440 [Async-1] o.s.ai.reader.pdf.PagePdfDocumentReader:114 : Processing PDF page: 1
20-10-2024 23:06:50.429 - INFO 52440 [Async-1] o.s.ai.reader.pdf.PagePdfDocumentReader:114 : Processing PDF page: 23
20-10-2024 23:06:50.520 - INFO 52440 [Async-1] o.s.ai.reader.pdf.PagePdfDocumentReader:114 : Processing PDF page: 45
20-10-2024 23:06:50.597 - INFO 52440 [Async-1] o.s.ai.reader.pdf.PagePdfDocumentReader:114 : Processing PDF page: 67
20-10-2024 23:07:16.583 - INFO 52440 [Async-1] o.s.ai.reader.pdf.PagePdfDocumentReader:114 : Processing PDF page: 89
20-10-2024 23:07:16.657 - INFO 52440 [Async-1] o.s.ai.reader.pdf.PagePdfDocumentReader:114 : Processing PDF page: 111
20-10-2024 23:07:16.749 - INFO 52440 [Async-1] o.s.ai.reader.pdf.PagePdfDocumentReader:114 : Processing PDF page: 133
20-10-2024 23:07:16.825 - INFO 52440 [Async-1] o.s.ai.reader.pdf.PagePdfDocumentReader:114 : Processing PDF page: 155
20-10-2024 23:07:16.899 - INFO 52440 [Async-1] o.s.ai.reader.pdf.PagePdfDocumentReader:114 : Processing PDF page: 177
20-10-2024 23:07:16.981 - INFO 52440 [Async-1] o.s.ai.reader.pdf.PagePdfDocumentReader:114 : Processing PDF page: 199
20-10-2024 23:07:17.087 - INFO 52440 [Async-1] o.s.ai.reader.pdf.PagePdfDocumentReader:156 : Processing 228 pages
20-10-2024 23:07:17.097 - ERROR 52440 [Async-1] r.o.c.service.pg.PGVectorStoreService:95 : Error while loading PDF Cloud_Architecture_Demystified_Understand_how_to_design_sustainable.pdf into Vector Store. Exception: NullPointerException - Message: null
20-10-2024 23:07:17.098 - ERROR 52440 [Async-1] o.s.a.i.SimpleAsyncUncaughtExceptionHandler:39 : Unexpected exception occurred invoking async method: public void ru.ogbozoyan.cron.service.pg.PgEventListener.process(ru.ogbozoyan.cron.service.pg.PgEvent)
java.lang.NullPointerException: null
at java.base/java.util.Objects.requireNonNull(Objects.java:233) ~[na:na]
at java.base/java.util.stream.Collectors.lambda$uniqKeysMapAccumulator$1(Collectors.java:180) ~[na:na]
at java.base/java.util.stream.ReduceOps$3ReducingSink.accept(ReduceOps.java:169) ~[na:na]
at java.base/java.util.HashMap$EntrySpliterator.forEachRemaining(HashMap.java:1858) ~[na:na]
at java.base/java.util.stream.AbstractPipeline.copyInto(AbstractPipeline.java:509) ~[na:na]
at java.base/java.util.stream.AbstractPipeline.wrapAndCopyInto(AbstractPipeline.java:499) ~[na:na]
at java.base/java.util.stream.ReduceOps$ReduceOp.evaluateSequential(ReduceOps.java:921) ~[na:na]
at java.base/java.util.stream.AbstractPipeline.evaluate(AbstractPipeline.java:234) ~[na:na]
at java.base/java.util.stream.ReferencePipeline.collect(ReferencePipeline.java:682) ~[na:na]
at org.springframework.ai.transformer.splitter.TextSplitter.createDocuments(TextSplitter.java:91) ~[spring-ai-core-1.0.0-M3.jar:1.0.0-M3]
at org.springframework.ai.transformer.splitter.TextSplitter.doSplitDocuments(TextSplitter.java:71) ~[spring-ai-core-1.0.0-M3.jar:1.0.0-M3]
at org.springframework.ai.transformer.splitter.TextSplitter.apply(TextSplitter.java:41) ~[spring-ai-core-1.0.0-M3.jar:1.0.0-M3]
at ru.ogbozoyan.cron.service.pg.PGVectorStoreService.saveNewPDFAsync(PGVectorStoreService.kt:91) ~[main/:na]
at ru.ogbozoyan.cron.service.pg.PgEventListener.process(PgEventListener.kt:20) ~[main/:na]
at java.base/jdk.internal.reflect.DirectMethodHandleAccessor.invoke(DirectMethodHandleAccessor.java:103) ~[na:na]
at java.base/java.lang.reflect.Method.invoke(Method.java:580) ~[na:na]