Create a new README and patch bugs from rewrite #90

HyperCodec · 2025-02-04T17:22:18Z

No description provided.

HyperCodec · 2025-02-04T17:22:52Z

I still need to test the example and maybe polish documentation a bit before merging.

HyperCodec · 2025-02-04T17:41:00Z

Weird, I'm encountering a lot of bugs while testing with an example that don't appear in the actual unit tests. I'm not sure why this is.

HyperCodec · 2025-02-04T17:50:43Z

it seems to always hang after the first 2-3 generations. it happens in both crossover and division reproduction.

HyperCodec · 2025-02-04T18:05:36Z

printing from the yield_now it says Idle, meaning there are no more tasks and this is probably an issue with input_count. if it is an input_count issue, it's most likely to be related to RandomlyMutable, as it's used in both types of reproduction.

viw-ty · 2025-02-04T20:32:47Z

So I've just stumbled into this crate, and I'm not even sure how you're getting past the first generation, I just get errors coming from here. I'll try to get it to work.

stack backtrace:
   // unwind
   3: rand::rng::Rng::gen_range
             at /home/virtio/.cargo/registry/src/index.crates.io-1949cf8c6b5b557f/rand-0.8.5/src/rng.rs:134:9
   4: neat::neuralnet::NeuralNetwork<_,_>::random_location
             at /home/virtio/.cargo/git/checkouts/neat-ea8e987889abd0bb/1a8896b/src/neuralnet.rs:253:41
   5: neat::neuralnet::NeuralNetwork<_,_>::random_location_in_scope
             at /home/virtio/.cargo/git/checkouts/neat-ea8e987889abd0bb/1a8896b/src/neuralnet.rs:265:19
   6: <neat::neuralnet::NeuralNetwork<_,_> as genetic_rs_common::builtin::RandomlyMutable>::mutate
             at /home/virtio/.cargo/git/checkouts/neat-ea8e987889abd0bb/1a8896b/src/neuralnet.rs:381:22
   7: <neat::neuralnet::NeuralNetwork<_,_> as genetic_rs_common::builtin::DivisionReproduction>::divide
             at /home/virtio/.cargo/git/checkouts/neat-ea8e987889abd0bb/1a8896b/src/neuralnet.rs:417:13
   8: <n::Agent as genetic_rs_common::builtin::DivisionReproduction>::divide
             at ./src/main.rs:9:45
   9: genetic_rs_common::builtin::next_gen::division_pruning_nextgen
             at /home/virtio/.cargo/registry/src/index.crates.io-1949cf8c6b5b557f/genetic-rs-common-0.5.4/src/builtin.rs:111:27
  10: core::ops::function::Fn::call
             at /home/virtio/.rustup/toolchains/nightly-x86_64-unknown-linux-gnu/lib/rustlib/src/rust/library/core/src/ops/function.rs:79:5
  11: <F as genetic_rs_common::NextgenFn<G>>::next_gen
             at /home/virtio/.cargo/registry/src/index.crates.io-1949cf8c6b5b557f/genetic-rs-common-0.5.4/src/lib.rs:41:9
  12: genetic_rs_common::GeneticSim<F,NG,G>::next_generation::{{closure}}
             at /home/virtio/.cargo/registry/src/index.crates.io-1949cf8c6b5b557f/genetic-rs-common-0.5.4/src/lib.rs:198:13
  13: replace_with::replace_with::{{closure}}
             at /home/virtio/.cargo/registry/src/index.crates.io-1949cf8c6b5b557f/replace_with-0.1.7/src/lib.rs:156:31
  14: replace_with::on_unwind
             at /home/virtio/.cargo/registry/src/index.crates.io-1949cf8c6b5b557f/replace_with-0.1.7/src/lib.rs:105:10
  15: replace_with::replace_with
             at /home/virtio/.cargo/registry/src/index.crates.io-1949cf8c6b5b557f/replace_with-0.1.7/src/lib.rs:156:13
  16: replace_with::replace_with_or_abort
             at /home/virtio/.cargo/registry/src/index.crates.io-1949cf8c6b5b557f/replace_with-0.1.7/src/lib.rs:244:2
  17: genetic_rs_common::GeneticSim<F,NG,G>::next_generation
             at /home/virtio/.cargo/registry/src/index.crates.io-1949cf8c6b5b557f/genetic-rs-common-0.5.4/src/lib.rs:189:9
  18: n::main
             at ./src/main.rs:65:5
  19: core::ops::function::FnOnce::call_once
             at /home/virtio/.rustup/toolchains/nightly-x86_64-unknown-linux-gnu/lib/rustlib/src/rust/library/core/src/ops/function.rs:250:5

use std::ops::Deref;

use neat::*;

const TESTS: [(usize, usize, usize); 4] = [(0, 0, 0), (0, 1, 1), (1, 0, 1), (1, 1, 0)];
const INPUTS: usize = 2;
const OUTPUTS: usize = 2;

#[derive(Clone, PartialEq, RandomlyMutable, DivisionReproduction)]
struct Agent {
    net: NeuralNetwork<INPUTS, OUTPUTS>,
}

impl Prunable for Agent {}

impl GenerateRandom for Agent {
    fn gen_random(rng: &mut impl rand::Rng) -> Self {
        Self {
            net: NeuralNetwork::new(MutationSettings::default(), rng),
        }
    }
}

impl Deref for Agent {
    type Target = NeuralNetwork<INPUTS, OUTPUTS>;

    fn deref(&self) -> &Self::Target {
        &self.net
    }
}

fn fitness(agent: &Agent) -> f32 {
    let mut fit = 0.0;
    for (a, b, c) in TESTS {
        let prediction = agent.predict([a as f32, b as f32]);
        let correct = max_index(prediction) == c;

        fit += match correct {
            true => 1.0,
            false => -0.1,
        }
    }
    fit
}

fn max_index<T>(v: T) -> usize
where
    T: IntoIterator,
    <T as IntoIterator>::Item: PartialOrd,
{
    v.into_iter()
        .enumerate()
        .reduce(|(ai, av), (bi, bv)| match bv > av {
            true => (bi, bv),
            false => (ai, av),
        })
        .unwrap()
        .0
}

fn main() {
    let mut sim = GeneticSim::new(Vec::gen_random(1000), fitness, division_pruning_nextgen);

    sim.next_generation();
}

HyperCodec · 2025-02-04T20:35:50Z

So I've just stumbled into this crate, and I'm not even sure how you're getting past the first generation, I just get errors coming from here. I'll try to get it to work.

stack backtrace:
   // unwind
   3: rand::rng::Rng::gen_range
             at /home/virtio/.cargo/registry/src/index.crates.io-1949cf8c6b5b557f/rand-0.8.5/src/rng.rs:134:9
   4: neat::neuralnet::NeuralNetwork<_,_>::random_location
             at /home/virtio/.cargo/git/checkouts/neat-ea8e987889abd0bb/1a8896b/src/neuralnet.rs:253:41
   5: neat::neuralnet::NeuralNetwork<_,_>::random_location_in_scope
             at /home/virtio/.cargo/git/checkouts/neat-ea8e987889abd0bb/1a8896b/src/neuralnet.rs:265:19
   6: <neat::neuralnet::NeuralNetwork<_,_> as genetic_rs_common::builtin::RandomlyMutable>::mutate
             at /home/virtio/.cargo/git/checkouts/neat-ea8e987889abd0bb/1a8896b/src/neuralnet.rs:381:22
   7: <neat::neuralnet::NeuralNetwork<_,_> as genetic_rs_common::builtin::DivisionReproduction>::divide
             at /home/virtio/.cargo/git/checkouts/neat-ea8e987889abd0bb/1a8896b/src/neuralnet.rs:417:13
   8: <n::Agent as genetic_rs_common::builtin::DivisionReproduction>::divide
             at ./src/main.rs:9:45
   9: genetic_rs_common::builtin::next_gen::division_pruning_nextgen
             at /home/virtio/.cargo/registry/src/index.crates.io-1949cf8c6b5b557f/genetic-rs-common-0.5.4/src/builtin.rs:111:27
  10: core::ops::function::Fn::call
             at /home/virtio/.rustup/toolchains/nightly-x86_64-unknown-linux-gnu/lib/rustlib/src/rust/library/core/src/ops/function.rs:79:5
  11: <F as genetic_rs_common::NextgenFn<G>>::next_gen
             at /home/virtio/.cargo/registry/src/index.crates.io-1949cf8c6b5b557f/genetic-rs-common-0.5.4/src/lib.rs:41:9
  12: genetic_rs_common::GeneticSim<F,NG,G>::next_generation::{{closure}}
             at /home/virtio/.cargo/registry/src/index.crates.io-1949cf8c6b5b557f/genetic-rs-common-0.5.4/src/lib.rs:198:13
  13: replace_with::replace_with::{{closure}}
             at /home/virtio/.cargo/registry/src/index.crates.io-1949cf8c6b5b557f/replace_with-0.1.7/src/lib.rs:156:31
  14: replace_with::on_unwind
             at /home/virtio/.cargo/registry/src/index.crates.io-1949cf8c6b5b557f/replace_with-0.1.7/src/lib.rs:105:10
  15: replace_with::replace_with
             at /home/virtio/.cargo/registry/src/index.crates.io-1949cf8c6b5b557f/replace_with-0.1.7/src/lib.rs:156:13
  16: replace_with::replace_with_or_abort
             at /home/virtio/.cargo/registry/src/index.crates.io-1949cf8c6b5b557f/replace_with-0.1.7/src/lib.rs:244:2
  17: genetic_rs_common::GeneticSim<F,NG,G>::next_generation
             at /home/virtio/.cargo/registry/src/index.crates.io-1949cf8c6b5b557f/genetic-rs-common-0.5.4/src/lib.rs:189:9
  18: n::main
             at ./src/main.rs:65:5
  19: core::ops::function::FnOnce::call_once
             at /home/virtio/.rustup/toolchains/nightly-x86_64-unknown-linux-gnu/lib/rustlib/src/rust/library/core/src/ops/function.rs:250:5

I have it patched already on a codespace, I just discovered more bugs so I didn't end up pushing.

viw-ty · 2025-02-04T20:37:15Z

Alright, I'll just figure out the easiest way and stare at it more KEKW. Good job on (as far as I know) somehow the best documented ML crate by the way.

HyperCodec · 2025-02-04T20:50:40Z

I did end up discovering a bunch of cases within stuff like add_connection where I forgot to update input_count (I had added random mutation before adding the field to neuron).

After patching like 3 of them and printing whenever a mutation occurs in a generation, it passes a lot more generations but eventually still hangs, even if there were no mutations occurring in the generation directly prior to the one where it hangs.

I’ll probably push these patches either tonight or tomorrow morning (my weekdays are super busy and I mainly just work on this in study hall and downtime between classes) so you can take a look.

viw-ty · 2025-02-04T21:52:07Z

I have no clue how far you are, but I'm going off and will just abuse this as a sticky note:
After having to learn how to use gdb on the fly, and being an idiot, this is the lock.

viw-ty · 2025-02-04T22:07:24Z

oh wait I'm stupid this is literally the lock and I spent god knows how long doing nothing

HyperCodec · 2025-02-04T22:46:31Z

oh wait I'm stupid this is literally the lock and I spent god knows how long doing nothing

yeah spamming println everywhere typically works better than gdb in deadlock/livelock situations. i had to implement a rather complex task management system (and if i really want to make it more efficient i would have to fork rayon, which sounds awful to do) in order to handle joins properly without wasting a ton of cpu usage or spamming locks (which fully deadlock on all threads bc they stop the entire rayon thread while locking) and i have no idea if it actually works because my unit tests apparently just completely lied to me.

i still have no clue how the unit tests manage to pass but not an example that does something pretty similar.

viw-ty · 2025-02-05T10:42:22Z

For now, this seems to fix the lock without any immediately apparent downsides, I'm not sure if I'm missing something important here, but I'll roll with it.

diff --git a/src/neuralnet.rs b/src/neuralnet.rs
index cce0d61..9e9ee3a 100644
--- a/src/neuralnet.rs
+++ b/src/neuralnet.rs
@@ -137,19 +139,17 @@ impl<const I: usize, const O: usize> NeuralNetwork<I, O> {
     fn eval(&self, loc: impl AsRef<NeuronLocation>, cache: Arc<NeuralNetCache<I, O>>) {
         let loc = loc.as_ref();
 
-        if !cache.claim(loc) {
+        if !cache.is_ready(loc) && !cache.claim(loc) {
+            // two things can happen here:
+            // we're trying to get a neuron that's not ready, which
+            // seems to cause deadlocks every time
+            // or
             // some other thread is already
             // waiting to do this task, currently doing it, or done.
             // no need to do it again.
             return;
         }
 
-        while !cache.is_ready(loc) {
-            // essentially spinlocks until the dependency tasks are complete,
-            // while letting this thread do some work on random tasks.
-            rayon::yield_now();
-        }
-
         let val = cache.get(loc);
         let n = self.get_neuron(loc);

HyperCodec · 2025-02-05T11:33:23Z

For now, this seems to fix the lock without any immediately apparent downsides, I'm not sure if I'm missing something important here, but I'll roll with it.

The cache by default is not ready, meaning that this code essentially just returns when it reaches the first hidden layer neuron. The cache.is_ready method only returns true when finished_inputs >= expected_inputs (i.e. all neurons that affect that neuron’s value have finished computing). The cache.claim is used to ensure that only one worker can sit there and spinlock on the value, and kills off all tasks that reach a neuron that’s already being processed. A task that has claimed a neuron needs to wait until that neuron’s value is ready before it tries to operate on it.

Btw the reason this whole system is in place is because I can rayon::yield_now to allow the thread to work on other tasks while this task waits for another dependency task to complete (which could be behind it in that thread's queue). If I had simply used an std-provided lock type such as RwLock, if a task reaches a neuron before the rest of tasks can push the data to it, it steals the lock and blocks those tasks from being able to push the data. Additionally, because rayon allocates multiple tasks to each thread, and std lock types force the entire thread to sleep, which freezes all subsequent tasks on that thread. #58 has a decent explanation if you're still confused.

HyperCodec · 2025-02-05T14:41:55Z

I pushed some of my patches if you want to take a look. I drastically reduced the size of the simulation in readme_ex
as well just to get a better view of what's going on without 100 concurrent simulations blasting stdout with text.

HyperCodec · 2025-02-05T14:45:40Z

In this run, it gets through 6 generations before a hang occurs, but there was not a single mutation aside from weights (which shouldn't ever cause a deadlock since it's just multiplication). This means there's probably something wrong with crossover.

HyperCodec · 2025-02-05T14:59:15Z

For now, this seems to fix the lock without any immediately apparent downsides, I'm not sure if I'm missing something important here, but I'll roll with it.

The cache by default is not ready, meaning that this code essentially just returns when it reaches the first hidden layer neuron. The cache.is_ready method only returns true when finished_inputs >= expected_inputs (i.e. all neurons that affect that neuron’s value have finished computing). The cache.claim is used to ensure that only one worker can sit there and spinlock on the value, and kills off all tasks that reach a neuron that’s already being processed. A task that has claimed a neuron needs to wait until that neuron’s value is ready before it tries to operate on it.

Btw the reason this whole system is in place is because I can rayon::yield_now to allow the thread to work on other tasks while this task waits for another dependency task to complete (which could be behind it in that thread's queue). If I had simply used an std-provided lock type such as RwLock, if a task reaches a neuron before the rest of tasks can push the data to it, it steals the lock and blocks those tasks from being able to push the data. Additionally, because rayon allocates multiple tasks to each thread, and std lock types force the entire thread to sleep, which freezes all subsequent tasks on that thread. #58 has a decent explanation if you're still confused.

Oh wait maybe you are right. There's no point in a complex claiming and waiting system if I can just make it so the last task to reach the neuron runs on it and then cancel any other ones. Not sure how I didn't think of that.

HyperCodec · 2025-04-02T18:24:29Z

might want to add the tracing crate to this as a feature or something to help with debugging. i'll create a separate issue for it

HyperCodec · 2025-04-04T12:53:40Z

Without the filter_map in handle_removed, it can sometimes overflow the stack because of cyclic connections. This means the input_count is probably 0 despite there being connections in the network. I'm not sure whether to keep the filter_map and potentially have some random bugs that are just hidden from causing a crash (because if a user calls remove_neuron on a linked neuron it'll just break things without the filter_map) or not.

Perhaps I have two remove_neuron methods, one public and one private, so that I still get that panic reassurance when I know there shouldn't be any inputs, but the public one is still user-friendly?

HyperCodec · 2025-04-04T12:54:25Z

Also probably going to branch out and implement #94 just so it's easier to debug this PR.

HyperCodec · 2025-04-04T12:56:27Z

oops for whatever reason my git is on a different branch but still pushed to here smh

update tracing branch

Tracing Feature

HyperCodec · 2025-04-10T12:43:54Z

Considering #83 to help even further with debugging. Tracing adds a lot of context but it's still pretty hard to visualize what's happening. Also it spams a ton of output. Would be better if I could dump all that info somewhere and then render/visualize it.

HyperCodec · 2025-04-15T13:46:37Z

The problem with doing any rendering stuff is that I do most of the work for this crate on a codespace, which can't render windows. Maybe I could output it to an image file, but then it's hard to really browse the spans and such.

HyperCodec · 2025-05-12T13:26:51Z

Working on a pretty major update for genetic-rs so going to delay this again for that. Don't think it should be a major delay bc the overall API is relatively the same. Helps solve issues like #92.

HyperCodec · 2025-05-12T13:27:41Z

Actually since the changes are relatively small, I could probably just make another PR once this is merged. I'm kind of tired of doing everything in this one branch. It's been open for way too long.

create a basic readme

d3c4d76

HyperCodec added documentation Improvements or additions to documentation enhancement New feature or request labels Feb 4, 2025

HyperCodec self-assigned this Feb 4, 2025

HyperCodec temporarily deployed to testing February 4, 2025 17:22 — with GitHub Actions Inactive

HyperCodec linked an issue Feb 4, 2025 that may be closed by this pull request

Finish new README #86

Open

HyperCodec marked this pull request as ready for review February 4, 2025 17:23

create readme example and patch some bugs

36f0677

HyperCodec temporarily deployed to testing February 4, 2025 17:49 — with GitHub Actions Inactive

patch a few bugs and add some temporary debug printing

571a277

HyperCodec temporarily deployed to testing February 5, 2025 14:29 — with GitHub Actions Inactive

small crossover indexing bugfix

744606e

HyperCodec temporarily deployed to testing February 5, 2025 14:52 — with GitHub Actions Inactive

remove cache claiming system

fd47053

HyperCodec added 3 commits April 2, 2025 16:57

extract split_random_connection

6c089cd

remove unnecessary clippy flag

8bc1511

cargo fmt

74eadfb

HyperCodec temporarily deployed to testing April 2, 2025 16:59 — with GitHub Actions Inactive

add tracing feature (still need to write the code)

cb4a521

HyperCodec temporarily deployed to testing April 4, 2025 12:55 — with GitHub Actions Inactive

HyperCodec temporarily deployed to testing April 4, 2025 13:04 — with GitHub Actions Inactive

HyperCodec added 8 commits April 4, 2025 09:04

Merge pull request #95 from HyperCodec/new-readme

c305448

update tracing branch

implement some spans (compile error, need to update genetic-rs)

8dc0d86

update genetic-rs version

c9c0c4f

add some spans

daf0595

cargo fmt

5efda92

custom debug impl for ActivationRegistry

d3a321b

cargo fmt

9b9f932

Merge pull request #96 from HyperCodec/tracing

e54f1d8

Tracing Feature

HyperCodec temporarily deployed to testing April 8, 2025 15:10 — with GitHub Actions Inactive

HyperCodec added 2 commits April 10, 2025 12:39

add a few tracing events

bf322d9

Merge branch 'new_readme'

c521a0c

HyperCodec temporarily deployed to testing April 10, 2025 12:41 — with GitHub Actions Inactive

fix clippy error

46a5d0a

HyperCodec deployed to testing April 10, 2025 12:45 — with GitHub Actions Active

HyperCodec changed the title ~~Create a new README and patch some bugs from rewrite~~ Create a new README and patch bugs from rewrite Apr 10, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Create a new README and patch bugs from rewrite #90

Create a new README and patch bugs from rewrite #90

HyperCodec commented Feb 4, 2025

HyperCodec commented Feb 4, 2025

HyperCodec commented Feb 4, 2025

HyperCodec commented Feb 4, 2025 •

edited

Loading

HyperCodec commented Feb 4, 2025

viw-ty commented Feb 4, 2025

HyperCodec commented Feb 4, 2025

viw-ty commented Feb 4, 2025

HyperCodec commented Feb 4, 2025 •

edited

Loading

viw-ty commented Feb 4, 2025 •

edited

Loading

viw-ty commented Feb 4, 2025

HyperCodec commented Feb 4, 2025 •

edited

Loading

viw-ty commented Feb 5, 2025

HyperCodec commented Feb 5, 2025 •

edited

Loading

HyperCodec commented Feb 5, 2025 •

edited

Loading

HyperCodec commented Feb 5, 2025 •

edited

Loading

HyperCodec commented Feb 5, 2025

HyperCodec commented Apr 2, 2025

HyperCodec commented Apr 4, 2025

HyperCodec commented Apr 4, 2025

HyperCodec commented Apr 4, 2025

HyperCodec commented Apr 10, 2025 •

edited

Loading

HyperCodec commented Apr 15, 2025

HyperCodec commented May 12, 2025

HyperCodec commented May 12, 2025 •

edited

Loading

Create a new README and patch bugs from rewrite #90

Are you sure you want to change the base?

Create a new README and patch bugs from rewrite #90

Conversation

HyperCodec commented Feb 4, 2025

HyperCodec commented Feb 4, 2025

HyperCodec commented Feb 4, 2025

HyperCodec commented Feb 4, 2025 • edited Loading

HyperCodec commented Feb 4, 2025

viw-ty commented Feb 4, 2025

HyperCodec commented Feb 4, 2025

viw-ty commented Feb 4, 2025

HyperCodec commented Feb 4, 2025 • edited Loading

viw-ty commented Feb 4, 2025 • edited Loading

viw-ty commented Feb 4, 2025

HyperCodec commented Feb 4, 2025 • edited Loading

viw-ty commented Feb 5, 2025

HyperCodec commented Feb 5, 2025 • edited Loading

HyperCodec commented Feb 5, 2025 • edited Loading

HyperCodec commented Feb 5, 2025 • edited Loading

HyperCodec commented Feb 5, 2025

HyperCodec commented Apr 2, 2025

HyperCodec commented Apr 4, 2025

HyperCodec commented Apr 4, 2025

HyperCodec commented Apr 4, 2025

HyperCodec commented Apr 10, 2025 • edited Loading

HyperCodec commented Apr 15, 2025

HyperCodec commented May 12, 2025

HyperCodec commented May 12, 2025 • edited Loading

HyperCodec commented Feb 4, 2025 •

edited

Loading

HyperCodec commented Feb 4, 2025 •

edited

Loading

viw-ty commented Feb 4, 2025 •

edited

Loading

HyperCodec commented Feb 4, 2025 •

edited

Loading

HyperCodec commented Feb 5, 2025 •

edited

Loading

HyperCodec commented Feb 5, 2025 •

edited

Loading

HyperCodec commented Feb 5, 2025 •

edited

Loading

HyperCodec commented Apr 10, 2025 •

edited

Loading

HyperCodec commented May 12, 2025 •

edited

Loading