bug: NPE when processing ByteArrayResource

I've also created proposal how to fix it https://github.com/spring-projects/spring-ai/pull/1572
**Bug description**
 If processing Resource doesn't have getFileName() at org.springframework.ai.transformer.splitter.TextSplitter#createDocuments will throw NPE

**Environment**
Please provide as many details as possible: springAiVersion = "1.0.0-M3", Java 21, PG vector store

PgVectorStore [compose file](https://github.com/ogbozoyan/cron-validator/blob/main/pgvector/docker-compose.yaml)
Also i have [init sql](https://github.com/ogbozoyan/cron-validator/blob/main/src/main/resources/db/migration/V1__init.sql)


**Steps to reproduce**
I've endpoint [CronController](https://github.com/ogbozoyan/cron-validator/blob/main/src/main/kotlin/ru/ogbozoyan/cron/web/controller/CronController.kt) which produce event to [PgEventListener](https://github.com/ogbozoyan/cron-validator/blob/main/src/main/kotlin/ru/ogbozoyan/cron/service/pg/PgEventListener.kt)
in listener asynchronously calls [PGVectorStoreService](https://github.com/ogbozoyan/cron-validator/blob/main/src/main/kotlin/ru/ogbozoyan/cron/service/pg/PGVectorStoreService.kt) i've wrote wrapper method for each document enrich with filename, if i pass documents to `org.springframework.ai.transformer.splitter.TextSplitter#createDocuments` 
without filenames in metadata part, when collector try to get `e.getValue()` will throw unclear NPE

```java
Map<String, Object> metadataCopy = metadata.entrySet()
					.stream()
					.collect(Collectors.toMap(e -> e.getKey(), e -> e.getValue()));
``` 

**Expected behavior**
Safely Collectors.toMap calling 

**Minimal Complete Reproducible example**

```log
20-10-2024 23:06:50.018  -  INFO 52440 [Async-1]  r.o.cron.service.pg.PgEventListener:16  : Processing event: PgEvent(resource=Byte array resource [resource loaded from byte array], type=PDF, fileName=Cloud_Architecture_Demystified_Understand_how_to_design_sustainable.pdf)
20-10-2024 23:06:50.019  -  INFO 52440 [Async-1]  r.o.cron.service.pg.PgEventListener:19  : PDF processing event: PgEvent(resource=Byte array resource [resource loaded from byte array], type=PDF, fileName=Cloud_Architecture_Demystified_Understand_how_to_design_sustainable.pdf)
20-10-2024 23:06:50.099  -  INFO 52440 [Async-1]  r.o.c.service.pg.PGVectorStoreService:83  : Loading Cloud_Architecture_Demystified_Understand_how_to_design_sustainable.pdf Reference PDF into Vector Store
20-10-2024 23:06:50.232  -  INFO 52440 [Async-1]  o.s.ai.reader.pdf.PagePdfDocumentReader:114  : Processing PDF page: 1
20-10-2024 23:06:50.429  -  INFO 52440 [Async-1]  o.s.ai.reader.pdf.PagePdfDocumentReader:114  : Processing PDF page: 23
20-10-2024 23:06:50.520  -  INFO 52440 [Async-1]  o.s.ai.reader.pdf.PagePdfDocumentReader:114  : Processing PDF page: 45
20-10-2024 23:06:50.597  -  INFO 52440 [Async-1]  o.s.ai.reader.pdf.PagePdfDocumentReader:114  : Processing PDF page: 67
20-10-2024 23:07:16.583  -  INFO 52440 [Async-1]  o.s.ai.reader.pdf.PagePdfDocumentReader:114  : Processing PDF page: 89
20-10-2024 23:07:16.657  -  INFO 52440 [Async-1]  o.s.ai.reader.pdf.PagePdfDocumentReader:114  : Processing PDF page: 111
20-10-2024 23:07:16.749  -  INFO 52440 [Async-1]  o.s.ai.reader.pdf.PagePdfDocumentReader:114  : Processing PDF page: 133
20-10-2024 23:07:16.825  -  INFO 52440 [Async-1]  o.s.ai.reader.pdf.PagePdfDocumentReader:114  : Processing PDF page: 155
20-10-2024 23:07:16.899  -  INFO 52440 [Async-1]  o.s.ai.reader.pdf.PagePdfDocumentReader:114  : Processing PDF page: 177
20-10-2024 23:07:16.981  -  INFO 52440 [Async-1]  o.s.ai.reader.pdf.PagePdfDocumentReader:114  : Processing PDF page: 199
20-10-2024 23:07:17.087  -  INFO 52440 [Async-1]  o.s.ai.reader.pdf.PagePdfDocumentReader:156  : Processing 228 pages
20-10-2024 23:07:17.097  - ERROR 52440 [Async-1]  r.o.c.service.pg.PGVectorStoreService:95  : Error while loading PDF Cloud_Architecture_Demystified_Understand_how_to_design_sustainable.pdf into Vector Store. Exception: NullPointerException - Message: null
20-10-2024 23:07:17.098  - ERROR 52440 [Async-1]  o.s.a.i.SimpleAsyncUncaughtExceptionHandler:39  : Unexpected exception occurred invoking async method: public void ru.ogbozoyan.cron.service.pg.PgEventListener.process(ru.ogbozoyan.cron.service.pg.PgEvent)

java.lang.NullPointerException: null
	at java.base/java.util.Objects.requireNonNull(Objects.java:233) ~[na:na]
	at java.base/java.util.stream.Collectors.lambda$uniqKeysMapAccumulator$1(Collectors.java:180) ~[na:na]
	at java.base/java.util.stream.ReduceOps$3ReducingSink.accept(ReduceOps.java:169) ~[na:na]
	at java.base/java.util.HashMap$EntrySpliterator.forEachRemaining(HashMap.java:1858) ~[na:na]
	at java.base/java.util.stream.AbstractPipeline.copyInto(AbstractPipeline.java:509) ~[na:na]
	at java.base/java.util.stream.AbstractPipeline.wrapAndCopyInto(AbstractPipeline.java:499) ~[na:na]
	at java.base/java.util.stream.ReduceOps$ReduceOp.evaluateSequential(ReduceOps.java:921) ~[na:na]
	at java.base/java.util.stream.AbstractPipeline.evaluate(AbstractPipeline.java:234) ~[na:na]
	at java.base/java.util.stream.ReferencePipeline.collect(ReferencePipeline.java:682) ~[na:na]
	at org.springframework.ai.transformer.splitter.TextSplitter.createDocuments(TextSplitter.java:91) ~[spring-ai-core-1.0.0-M3.jar:1.0.0-M3]
	at org.springframework.ai.transformer.splitter.TextSplitter.doSplitDocuments(TextSplitter.java:71) ~[spring-ai-core-1.0.0-M3.jar:1.0.0-M3]
	at org.springframework.ai.transformer.splitter.TextSplitter.apply(TextSplitter.java:41) ~[spring-ai-core-1.0.0-M3.jar:1.0.0-M3]
	at ru.ogbozoyan.cron.service.pg.PGVectorStoreService.saveNewPDFAsync(PGVectorStoreService.kt:91) ~[main/:na]
	at ru.ogbozoyan.cron.service.pg.PgEventListener.process(PgEventListener.kt:20) ~[main/:na]
	at java.base/jdk.internal.reflect.DirectMethodHandleAccessor.invoke(DirectMethodHandleAccessor.java:103) ~[na:na]
	at java.base/java.lang.reflect.Method.invoke(Method.java:580) ~[na:na]
```


Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

bug: NPE when processing ByteArrayResource #1571

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

bug: NPE when processing ByteArrayResource #1571

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions