Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Recursive iteration #36

Merged
merged 40 commits into from
Nov 24, 2024
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
40 commits
Select commit Hold shift + click to select a range
b408762
Working
halaxa Oct 21, 2022
1466ddb
\JsonMachineTest\ParserTest::testZigZagRecursiveIteration
halaxa Oct 22, 2022
e21abf9
Added 'recursive' option
halaxa Oct 22, 2022
64b5f23
Documentation
halaxa Oct 22, 2022
aba5205
Build fixed
halaxa Oct 22, 2022
ccd4a7d
Tokens reverted. Iterator memoization moved from Tokens to Parser
halaxa Oct 22, 2022
adcfcde
Removed useless condition
halaxa Oct 22, 2022
d40b040
$jsonBuffer -> $jsonValue
halaxa Oct 22, 2022
bda567a
Finishing of an unfinished sub-iterator for convenience
halaxa Oct 23, 2022
da88e4d
Readme fix
halaxa Oct 24, 2022
b22c24d
NestedIterator skeleton
halaxa Oct 24, 2022
3b3129b
advanceToKey()
halaxa Oct 26, 2022
55b26de
toArray()
halaxa Oct 27, 2022
27ba49f
Merge branch 'master' into recursive
halaxa Nov 29, 2023
fd46d46
PHPStan fixes + testRecursiveIterationYieldsNestedIterator
halaxa Nov 30, 2023
01fc434
Readme update
halaxa Nov 30, 2023
754d360
RecursiveItems facade
halaxa Dec 1, 2023
cf83311
NestedIterator replaced with RecursiveItems
halaxa Aug 26, 2024
ef546c5
Merged master into recursive
halaxa Nov 19, 2024
8da949f
Fixed failing testRecursiveParserDoesNotRequireChildParserToBeIterate…
halaxa Nov 20, 2024
610a127
Removed empty test
halaxa Nov 20, 2024
8dd061c
Code hack fixed
halaxa Nov 20, 2024
85aeb9a
Parser::getPosition() works inside nested collections
halaxa Nov 20, 2024
f8fad15
wip
halaxa Nov 21, 2024
8f4507a
testToArrayThrowsMeaningfulErrorWhenIteratorIsAlreadyOpen
halaxa Nov 21, 2024
e96be16
cs-fix
halaxa Nov 21, 2024
99e219c
composer update on build
halaxa Nov 21, 2024
0c5096a
Merge branch 'master' into recursive
halaxa Nov 22, 2024
6b2b6e2
Readme merge fix
halaxa Nov 22, 2024
05dc2eb
dropped compatibility with older phpunit
halaxa Nov 22, 2024
644fe90
Fix build
halaxa Nov 22, 2024
0857857
phpstan version fix
halaxa Nov 23, 2024
f051ff5
Readme update
halaxa Nov 23, 2024
da15ab2
NestedIterator remnants deleted
halaxa Nov 23, 2024
7c62a01
Performance improvements
halaxa Nov 23, 2024
1df75dd
Dead code removal
halaxa Nov 23, 2024
21bea75
Recursive focused performace optimizations. Ops outside the main fore…
halaxa Nov 23, 2024
4275f18
Readme updates
halaxa Nov 24, 2024
81db7bc
RecursiveItems::advanceToKey() chaining + array access
halaxa Nov 24, 2024
d5024df
Readme updates
halaxa Nov 24, 2024
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
2 changes: 2 additions & 0 deletions .php-cs-fixer.dist.php
Original file line number Diff line number Diff line change
Expand Up @@ -15,6 +15,8 @@
'visibility_required' => false,
'php_unit_test_class_requires_covers' => true,
'declare_strict_types' => true,
'phpdoc_to_comment' => false, // todo remove when we move to GeneratorAggregate

])
->setFinder($finder)
;
3 changes: 2 additions & 1 deletion CHANGELOG.md
Original file line number Diff line number Diff line change
Expand Up @@ -8,7 +8,8 @@ and this project adheres to [Semantic Versioning](https://semver.org/spec/v2.0.0
<br>

## master
Nothing yet
### Added
- Recursive iteration via new facade `RecursiveItems`. See **Recursive iteration** in README.

<br>

Expand Down
6 changes: 3 additions & 3 deletions Makefile
Original file line number Diff line number Diff line change
Expand Up @@ -29,7 +29,7 @@ help:
@grep -E '^[-a-zA-Z0-9_\.\/]+:.*?## .*$$' $(MAKEFILE_LIST) | awk 'BEGIN {FS = ":.*?## "}; {printf " \033[32m%-15s\033[0m\t%s\n", $$1, $$2}'


build: composer-validate cs-check phpstan tests-all ## Run all necessary stuff before commit.
build: composer-update cs-check phpstan tests-all ## Run all necessary stuff before commit.


tests: ## Run tests on recent PHP version. Pass args to phpunit via ARGS=""
Expand Down Expand Up @@ -66,8 +66,8 @@ performance-tests: ## Run performance tests
@$(call DOCKER_RUN,$(LATEST_PHP),composer performance-tests)


composer-validate: ## Validate composer.json contents
@$(call DOCKER_RUN,$(LATEST_PHP),composer validate)
composer-update: ## Validate composer.json contents
@$(call DOCKER_RUN,$(LATEST_PHP),composer update)


release: .env build
Expand Down
152 changes: 127 additions & 25 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -8,6 +8,9 @@ for PHP >=7.2. See [TL;DR](#tl-dr). No dependencies in production except optiona
[![Latest Stable Version](https://img.shields.io/github/v/release/halaxa/json-machine?color=blueviolet&include_prereleases&logoColor=white)](https://packagist.org/packages/halaxa/json-machine)
[![Monthly Downloads](https://img.shields.io/packagist/dt/halaxa/json-machine?color=%23f28d1a)](https://packagist.org/packages/halaxa/json-machine)

---
NEW in version `1.2.0` - [Recursive iteration](#recursive)

---

* [TL;DR](#tl-dr)
Expand All @@ -18,6 +21,7 @@ for PHP >=7.2. See [TL;DR](#tl-dr). No dependencies in production except optiona
+ [Parsing nested values in arrays](#parsing-nested-values)
+ [Parsing a single scalar value](#getting-scalar-values)
+ [Parsing multiple subtrees](#parsing-multiple-subtrees)
+ [Recursive iteration](#recursive)
+ [What is JSON Pointer anyway?](#json-pointer)
* [Options](#options)
* [Parsing streaming responses from a JSON API](#parsing-json-stream-api-responses)
Expand Down Expand Up @@ -51,10 +55,12 @@ for PHP >=7.2. See [TL;DR](#tl-dr). No dependencies in production except optiona

use \JsonMachine\Items;

// this often causes Allowed Memory Size Exhausted
// this often causes Allowed Memory Size Exhausted,
// because it loads all the items in the JSON into memory
- $users = json_decode(file_get_contents('500MB-users.json'));

// this usually takes few kB of memory no matter the file size
// this has very small memory footprint no matter the file size
// because it loads items into memory one by one
+ $users = Items::fromFile('500MB-users.json');

foreach ($users as $id => $user) {
Expand All @@ -67,9 +73,10 @@ Random access like `$users[42]` is not yet possible.
Use above-mentioned `foreach` and find the item or use [JSON Pointer](#parsing-a-subtree).

Count the items via [`iterator_count($users)`](https://www.php.net/manual/en/function.iterator-count.php).
Remember it will still have to internally iterate the whole thing to get the count and thus will take about the same time.
Remember it will still have to internally iterate the whole thing to get the count and thus will take about the same time
as iterating it and counting by hand.

Requires `ext-json` if used out of the box. See [Decoders](#decoders).
Requires `ext-json` if used out of the box but doesn't if a custom decoder is used. See [Decoders](#decoders).

Follow [CHANGELOG](CHANGELOG.md).

Expand Down Expand Up @@ -318,6 +325,117 @@ foreach ($fruits as $key => $value) {
}
```

<a name="recursive"></a>
### Recursive iteration
Use `RecursiveItems` instead of `Items` when the JSON structure is difficult or even impossible to handle with `Items`
and JSON pointers or the individual items you iterate are too big to handle.
On the other hand it's notably slower than `Items`, so bear that in mind.

When `RecursiveItems` encounters a list or dict in the JSON, it returns a new instance of itself
which can then be iterated over and the cycle repeats.
Thus, it never returns a PHP array or object, but only either scalar values or `RecursiveItems`.
No JSON dict nor list will ever be fully loaded into memory at once.

Let's see an example with many, many users with many, many friends:
```json
// users.json
[
{
"username": "user",
"e-mail": "[email protected]",
"friends": [
{
"username": "friend1",
"e-mail": "[email protected]"
},
{
"username": "friend2",
"e-mail": "[email protected]"
}
]
}
]
```

```php
<?php

use JsonMachine\RecursiveItems

$users = RecursiveItems::fromFile('users.json');
foreach ($users as $user) {
/** @var $user RecursiveItems */
foreach ($user as $field => $value) {
if ($field === 'friends') {
/** @var $value RecursiveItems */
foreach ($value as $friend) {
/** @var $friend RecursiveItems */
foreach ($friend as $friendField => $friendValue) {
$friendField == 'username';
$friendValue == 'friend1';
}
}
}
}
}
```

> If you break an iteration of such lazy deeper-level (i.e. you skip some `"friends"` via `break`)
> and advance to a next value (i.e. next `user`), you will not be able to iterate it later.
> JSON Machine must iterate it in the background to be able to read next value.
> Such an attempt will result in closed generator exception.

#### Convenience methods of `RecursiveItems`
- `toArray(): array`
If you are sure that a certain instance of RecursiveItems is pointing to a memory-manageable data structure
(for example, $friend), you can call `$friend->toArray()`, and the item will materialize into a plain PHP array.

- `advanceToKey(int|string $key): scalar|RecursiveItems`
When searching for a specific key in a collection (for example, `'friends'` in `$user`),
you do not need to use a loop and a condition to search for it.
Instead, you can simply call `$user->advanceToKey("friends")`.
It will iterate for you and return the value at this key. Calls can be chained.
It also supports **array like syntax** for advancing to and getting following indices.
So `$user['friends']` would be an alias for `$user->advanceToKey('friends')`. Calls can be chained.
Keep in mind that it's just an alias - **you won't be able to random-access previous indices**
after using this directly on `RecursiveItems`. It's just a syntax sugar.
Use `toArray()` if you need random access to indices on a record/item.

The previous example could thus be simplified as follows:
```php
<?php

use JsonMachine\RecursiveItems

$users = RecursiveItems::fromFile('users.json');
foreach ($users as $user) {
/** @var $user RecursiveItems */
foreach ($user['friends'] as $friend) { // or $user->advanceToKey('friends')
/** @var $friend RecursiveItems */
$friendArray = $friend->toArray();
$friendArray['username'] === 'friend1';
}
}
```
Chaining allows you to do something like this:
```php
<?php

use JsonMachine\RecursiveItems

$users = RecursiveItems::fromFile('users.json');
$users[0]['friends'][1]['username'] === 'friend2';

```

#### Also `RecursiveItems implements \RecursiveIterator`
So you can use for example PHP's builtin tools to work over `\RecursiveIterator` like those:

- [RecursiveCallbackFilterIterator](https://www.php.net/manual/en/class.recursivecallbackfilteriterator.php)
- [RecursiveFilterIterator](https://www.php.net/manual/en/class.recursivefilteriterator.php)
- [RecursiveRegexIterator](https://www.php.net/manual/en/class.recursiveregexiterator.php)
- [RecursiveTreeIterator](https://www.php.net/manual/en/class.recursivetreeiterator.php)

<a name="json-pointer"></a>
### What is JSON Pointer anyway?
It's a way of addressing one item in JSON document. See the [JSON Pointer RFC 6901](https://tools.ietf.org/html/rfc6901).
Expand Down Expand Up @@ -516,30 +634,14 @@ but you forgot to specify a JSON Pointer. See [Parsing a subtree](#parsing-a-sub
### "That didn't help"
The other reason may be, that one of the items you iterate is itself so huge it cannot be decoded at once.
For example, you iterate over users and one of them has thousands of "friend" objects in it.
Use `PassThruDecoder` which does not decode an item, get the json string of the user
and parse it iteratively yourself using `Items::fromString()`.

```php
<?php

use JsonMachine\Items;
use JsonMachine\JsonDecoder\PassThruDecoder;

$users = Items::fromFile('users.json', ['decoder' => new PassThruDecoder]);
foreach ($users as $user) {
foreach (Items::fromString($user, ['pointer' => "/friends"]) as $friend) {
// process friends one by one
}
}
```
The most efficient solution is to set `recursive` option to `true`.
See [Recursive iteration](#recursive).

<a name="step3"></a>
### "I am still out of luck"
It probably means that the JSON string `$user` itself or one of the friends are too big and do not fit in memory.
However, you can try this approach recursively. Parse `"/friends"` with `PassThruDecoder` getting one `$friend`
json string at a time and then parse that using `Items::fromString()`... If even that does not help,
there's probably no solution yet via JSON Machine. A feature is planned which will enable you to iterate
any structure fully recursively and strings will be served as streams.
It probably means that a single JSON scalar string itself is too big to fit in memory.
For example very big base64-encoded file.
In that case you will probably be still out of luck until JSON Machine supports yielding of scalar values as PHP streams.

<a name="installation"></a>
## Installation
Expand Down
5 changes: 3 additions & 2 deletions composer.json
Original file line number Diff line number Diff line change
Expand Up @@ -13,7 +13,7 @@
"tests-coverage": "build/composer-update.sh && XDEBUG_MODE=coverage vendor/bin/phpunit --coverage-clover clover.xml",
"cs-check": "build/composer-update.sh && vendor/bin/php-cs-fixer fix --dry-run --verbose --allow-risky=yes",
"cs-fix": "build/composer-update.sh && vendor/bin/php-cs-fixer fix --verbose --allow-risky=yes",
"phpstan": "build/composer-update.sh && vendor/bin/phpstan analyse",
"phpstan": "build/composer-update.sh && vendor/bin/phpstan --memory-limit=-1 analyse",
"performance-tests": "php -d xdebug.mode=off -d opcache.enable_cli=1 -d opcache.jit_buffer_size=100M test/performance/testPerformance.php"
},
"config": {
Expand All @@ -35,7 +35,8 @@
},
"autoload": {
"psr-4": {"JsonMachine\\": "src/"},
"exclude-from-classmap": ["src/autoloader.php"]
"exclude-from-classmap": ["src/autoloader.php"],
"files": ["src/functions.php"]
},
"autoload-dev": {
"psr-4": {"JsonMachineTest\\": "test/JsonMachineTest"}
Expand Down
9 changes: 9 additions & 0 deletions src/Exception/BadMethodCallException.php
Original file line number Diff line number Diff line change
@@ -0,0 +1,9 @@
<?php

declare(strict_types=1);

namespace JsonMachine\Exception;

class BadMethodCallException extends JsonMachineException
{
}
9 changes: 9 additions & 0 deletions src/Exception/OutOfBoundsException.php
Original file line number Diff line number Diff line change
@@ -0,0 +1,9 @@
<?php

declare(strict_types=1);

namespace JsonMachine\Exception;

class OutOfBoundsException extends JsonMachineException
{
}
80 changes: 80 additions & 0 deletions src/FacadeTrait.php
Original file line number Diff line number Diff line change
@@ -0,0 +1,80 @@
<?php

declare(strict_types=1);

namespace JsonMachine;

use JsonMachine\Exception\InvalidArgumentException;
use JsonMachine\JsonDecoder\ExtJsonDecoder;
use LogicException;

trait FacadeTrait
{
/**
* @var Parser
*/
private $parser;

/**
* @var bool
*/
private $debugEnabled;

public function isDebugEnabled(): bool
{
return $this->debugEnabled;
}

/**
* @throws InvalidArgumentException
*/
private static function createParser(iterable $bytesIterator, ItemsOptions $options, bool $recursive): Parser
{
if ($options['debug']) {
$tokensClass = TokensWithDebugging::class;
} else {
$tokensClass = Tokens::class;
}

return new Parser(
new $tokensClass(
$bytesIterator
),
$options['pointer'],
$options['decoder'] ?: new ExtJsonDecoder(),
$recursive
);
}

/**
* Returns JSON bytes read so far.
*/
public function getPosition()
{
if ($this->parser instanceof PositionAware) {
return $this->parser->getPosition();
}

throw new LogicException('getPosition() may only be called on PositionAware');
}

/**
* @param string $string
*/
abstract public static function fromString($string, array $options = []): self;

/**
* @param string $file
*/
abstract public static function fromFile($file, array $options = []): self;

/**
* @param resource $stream
*/
abstract public static function fromStream($stream, array $options = []): self;

/**
* @param iterable $iterable
*/
abstract public static function fromIterable($iterable, array $options = []): self;
}
Loading
Loading