Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

If interrupt occurs while doing the asynchronous read of I2C, the poll will be rejected and the chip can be left in unrecoverable state #59

Closed
lynniemagoo opened this issue Dec 17, 2023 · 39 comments
Assignees
Labels
bug Something isn't working

Comments

@lynniemagoo
Copy link
Contributor

lynniemagoo commented Dec 17, 2023

Hello Peter,

I've spent the past few days trying to get to the bottom of an issue. I have a project I'm working on that monitors various IO Expander chips and it relies solely on interrupts to detect changes once the Chip class is constructed. I've specifically modelled a cat9555 and mcp23017 after the pcf857X typescript code.

If one is solely relying on interrupts to drive chip state instead of using doPoll() , it is entirely possible that should multiple pins change arbitrarily, that multiple interrupts can occur. Should an interrupt occur when reading the chip asynchronously in _doPoll() then a rejection ("An other poll is active") would immediately occur. Once this rejection occurs, it has been proven via my testing on all my IO chips including both a PCF8574 and PCF8575 that no additional interrupts will fire until an external doPoll() is requested. In my base code, I added a timeout to repeatedly call doPoll() but that is not an ideal solution as 98%+ of the time, no issues arise. So, I decided to go one layer lower, to the chip code.
The condition I'm speaking of can be avoided if upon detecting the error, issue one or more setTimeout calls to invoke a poll after the first I2C read has completed and the chip is available for a subsequent doPoll() call.

I will be opening a PR with a fix for this issue. The proposed fix uses up to 2 attempts to read the chip via _doPoll().

Let me know your thoughts. I've tested this fix across all my IO Expander modules and it works beautifully. I've not been able to get into a condition where an interrupt occurs and the code doesn't finally read the chip to clear the interrupt.

Happy Holidays,
Lyndel

lynniemagoo added a commit to lynniemagoo/peter-node-pcf8574 that referenced this issue Dec 17, 2023
…ous read of I2C, the poll will be rejected and the chip can be left in unrecoverable state

Bump version number
Provide tiered read using setTimeout
@lynniemagoo
Copy link
Contributor Author

There is one additional alternative I did not consider at first.
Instead of reading the 1 or 2 bytes asynchrously with a callback in _doPoll(), read I2C synchronously in a separate function _interruptPoll(). This would guarantee that the chip is read during the interrupt service routine but could have an unfortunate side-effect of missing interrupt as we are taking longer to service the interrupt.

Thoughts appreciated.

@lynniemagoo
Copy link
Contributor Author

I've attached a sample test file that illustrates how we could do delayed retries should something fail. This follows the promise reject retry design pattern from here:

https://stackoverflow.com/questions/38213668/promise-retry-design-patterns

test.zip

This would allow us to have a finite set of retries with a delay that starts small and grows to a max value.

I'm actually thinking of implementing this in my other modules. What do you think?

Lynde

@crycode-de
Copy link
Owner

Thank you for your explanations and figuring out this issue. 👍️

In my opinion it would be the best option to delay the execution of the _poll() method while _currentlyPolling is true.
Ideally this could use a promise queue (like described here) to queue up and handle all poll requests. I'll implement this in a separate branch. Maybe you can test this?

Using the sync i2c functions should be no option here since this will block the event loop of the whole node processes and so may affect applications using this module.

@crycode-de crycode-de self-assigned this Dec 19, 2023
@crycode-de crycode-de added the bug Something isn't working label Dec 19, 2023
crycode-de added a commit that referenced this issue Dec 19, 2023
... instead of rejecting them if an other poll is currently active
Added method `isPolling()` to get the current polling state

May fix #59
@crycode-de
Copy link
Owner

@lynniemagoo Please see the pollqueue branch and let me know what you think. 🙂

@lynniemagoo
Copy link
Contributor Author

I reviewed the code on the branch. Will be able to test later this evening.

It's a more elegant solution than the test file I submitted above. I like it. However be forewarned that it does come with memory issues.

Consistent Pin Changes or repeated calls to doPoll() in short timeframes can cause possible memory issues!

Use Case 1 - Interrupt-Driven Polling

Assumption is that PCF857x configured to manage interrupts and poll on interrupt.
Consider that a pin that is changing rapidly for either short bursts or for a constant rate, let's say every 5 ms.
The I2C read takes for example, 125ms.

In theory, we'd queue 25 promises before 1 promise could be serviced.

The promise-queue as-is would likely be OK for short bursts of pin changes as eventually the promise queue would be drained, but if is a constant interval of 5ms is present, then expect larger memory growth and if the condition does not cease, out of memory exceptions.

The solution that I proposed in the attached test file limits the amount of retries that can be done to 5 and also provides the ability to hold off subsequent requests using setTimeout.

There might be a better/hybrid solution.

The interrupt flag in the chip is cleared when the chip is read. So, all we really need to guarantee is that a _poll() will occur after the interrupt occurs. In theory, if the promise queue depth is > 1, this condition will always be satisfied as a subsequent promise will eventually execute _poll() to clear the chip.

Use Case 2 - Manual Polling
Same as above but instead of handling interrupts, application code calls doPoll() every 5ms but I2C read takes 125ms.

Here, we have the same conditions described in Use Case 1 above, promises are being queued faster than they can be serviced. On might say you cannot fix stupidity but things happen.

Proposal - Modified Promise-Queue implementation

Within _handleInterrupt() and doPoll(), what we might want to do here is to do a conditional enqueue() where if depth is 0 or 1, add a promise and if depth is > 1 then do nothing in _handleInterrupt() or immediately reject in doPoll().

This would require either an additional signature or an overload on the PromiseQueue's enqueue() method but would allow us to elegantly solve the problem.

Thoughts appreciated.

@crycode-de
Copy link
Owner

Oh yes... I completely missed that a poll may be triggered more often than the i2c bus can handle the requests. You're totally right!

I like your proposal of the modified promise queue implementation. I've added a commit containing maximum length limit for the queue. Now one poll may be active and one queue promise is allowed. Any additional try to add something to the queue will be rejected.

@lynniemagoo
Copy link
Contributor Author

lynniemagoo commented Dec 19, 2023

Just reviewied. Nice job!!!! I will test it later this evening. I thought we might be able to centralize a finally block but upon further review, I was incorrect. So I edited deleted modified dequeue() code from this post.

However, I think that in pcf857x.ts, we should change the max queue length to 2 (or 3 - see below) from 1.

A max limit of 1 results the same behavior as before, where if back to back interrupts fire and we are in middle of first poll when subsequent interrupt occurs (we'd reject if queue full) or if the application code manually polls and this poll is active at the time of the interrupt (see race conditions notes below), we could fail to service the interrupt as well.

Changing limit to 2 allows us to recover from back-to-back interrupts and would also allow an application that needs to call doPoll() alongside managing interrupts.

But as usual, as I type this, I'm wondering if 2 is sufficient. Consider fast interrupts and a user doing a poll from application code. If the user poll is active, and then an interrupt occurs, we'd still be able to service the first interrupt as the limit is 2. However, if 2nd interrupt occurs, and we reject, there is a race condition where the last I2C read might have completed just prior to the interrupt and the chip would not be read which leaves us as before.

For this reason, I'd say let's go the safer route and support a maximum limit of 3 vs 1 in pcf857x.ts.

Thoughts?
Lyndel

@lynniemagoo
Copy link
Contributor Author

lynniemagoo commented Dec 19, 2023

Hello Peter,
Update - Review this only if you are interested in looking at what I did and where I went wrong.

I got some time to test using the maxQueueDepth = 1 and am seeing rejected promises from _handleInterrupt. I have updated the .js versions of promise-queue and pcf857x to put in .catch() for the promises as well as logging stack trace where reject(s) occur. The example.js in the zip attached here is the file I'm using to test. I changed the address to 0x20 and added pins 4, 5, 6 as additional inputs. Then, I added setInterval doPoll() for 3ms.

On my test harness, from the log inside the zip, you can see the end result. If enqueue() resulting from doPoll() rejects inside the interval, all is well. However, if enqueue() resulting from _handleInterrupt rejects, we have handled promise rejection.

In the JS files, I attempted to capture the promise reject and a stack trace with hopes of fixing this issue. Please see my changes in pcf857x.js as an attempt to fix.

peter-stuff.zip

Thoughts appreciated.

@lynniemagoo
Copy link
Contributor Author

lynniemagoo commented Dec 19, 2023

Update and good news - : I pulled the project a 2nd time and used a slightly modified example file. I'm happy to report that the unhandled promise on _handleInterrupt went away. Likely the original cause was due to me stupidly adding some catch handlers in pcf857x.js which I then reverted when I was able to isolate the cause.

Anyway, attached is the example.js that I used successfully. I actually added a couple of console.log entries to the promise-queue.js file but did not include them here.

With the short poll interval alongside manual button presses causing interrupts, I do see a rejects periodically in the interval, with no apparent consistency with a queue depth of 1,

peter-stuff2.zip

For my 3rd test run, I'm using maxQueueDepth = 1 and I've disabled the interval and am 100% reliable on interrupts. The good news is that possibly due to timing, I'm not seeing any occurrences of rejection due to fast manual button presses.

To be 100% safe and if you agree, I'll reiterate what I said above and recommend we go with a depth of 3 to fully cover 2 interrupts plus one manual poll being handled concurrently. A max of 2 would likely prevent issue from occurring but I usually err on the side of caution and would choose 3 instead.

Let me know when you want me to test the next round of changes.

Thanks again for jumping into this quickly. As I mentioned, my cat9555 and mcp23017 projects are modelled after this project (literally a lift of the TS code with register reads/writes instead). With your permission, I'd also like to include this new logic in those private projects which I may make public when all done.

Thanks,
Lyndel

@lynniemagoo
Copy link
Contributor Author

lynniemagoo commented Dec 19, 2023

One final note.
All this testing was done on a Pi3B+ I did not test with a Pi Zero/ZeroW or a 3A+. I bet these would be slower.

I have the 3ms interval poll disabled and using maxDepth = 1 and have now been able to see occurrences where back to back interrupts can occur and the reject happens during the 2nd _handleInterrupt call. When I see this happen, it's usually within the first 30 seconds of process spin-up. After that, things settle down. But this test does indeed reinforce in my mind the need to have a limit > 1 to avoid a race I mentioned above that could happen should the I2C read might have completed just prior to the interrupt and therefore interrupts could get stuck.

Overall, the progress today makes me very happy and suggest to me that we need to have max depth as 2 or 3.
Of course, we'd likely want some additional documentation if a limit of 2 is chosen vs 3. See below.

I leave that decision up to you. :-)

maxDepth = 2 - Should document limitation that application can either manually poll or or enable interrupts only, but no both.
maxDepth = 3 - No need to document any limitations. Module should be able to handle both manual polls and interrupt-driven behavior.

@lynniemagoo
Copy link
Contributor Author

I left the process active (interrupt only) for over 5 hours and periodically ran multiple button presses simultaneously and over the 5 hours, I got around 11 rejects when the maxQueueDepth was 1.

Going back to my thoughts earlier, what I did not test is 2 different intervals each doing a poll which would be way overkill. However, I think with a queue depth of 3, we could handle it.

@crycode-de
Copy link
Owner

Thank you for the great analysis!

I think we should use a queue length of 3 to be save, which means that there may be effectively up to 4 polls "active" (one that is currently processed and 3 waiting). So we can be absolutely sure to not miss anything.

If someone requests more polls, this can/should be rejected since they are definitely not needed. The rejection in _handleInterrupt() is already handled/ignored as it doesn't matter. If doPoll() is called by the user, he should handle the rejection himself.

Also this limit and possible rejection should be documented. I'll do this probably tomorrow.

Of course you can use this also in your two derived projects. :)

@lynniemagoo
Copy link
Contributor Author

lynniemagoo commented Dec 20, 2023 via email

@crycode-de
Copy link
Owner

It's a little bit confusing...

The intended behavior with maxQueueLength = 3 is, that you are able to enqueue up to 4 polls at once. The first is directly started and the other three are enqueued for later processing.

public enqueue<T = void> (promise: () => Promise<T>): Promise<T> {
// check the maximum queue length
if (this.maxQueueLength !== undefined && this.queue.length >= this.maxQueueLength) {
return Promise.reject('Maximum queue length exceeded');
}

So with maxQueueLength = 3 there have to be already 3 tasks in the queue (and one currently working) to get the rejection.
Or am I missing something here?

@lynniemagoo
Copy link
Contributor Author

lynniemagoo commented Dec 20, 2023

Because of the >= on line 53, there can never be more than 2 entries in the queue, not 3. So with a limit of 3 there can be 2 in queue and one active for a total of 3, not 4

@crycode-de
Copy link
Owner

  1. enqueue call -> starts directly -> queue.length is 0
  2. enqueue call -> goes info queue[0] -> queue.length is 1
  3. enqueue call -> goes info queue[1] -> queue.length is 2
  4. enqueue call -> goes info queue[2] -> queue.length is 3
  5. enqueue call -> reject cause of queue.length is 3 and 3 >= 3 (maxQueueLength)

🤔

@lynniemagoo
Copy link
Contributor Author

lynniemagoo commented Dec 20, 2023

In the words of Homer Simpson, Dohhhh! Somehow I was not seeing that but when you outlined it, yes, I see it now.

We shall stick with 3 as that is extra bullet-proofing.

Let me know when you button it up and I'll regression if needed.

@lynniemagoo
Copy link
Contributor Author

Hey, I see you bumped the major version. I was wondering if it also makes sense at this time to change the pcf8574/pcf8575 constructor instead of an 'initialValue:? any' instead supply an 'options: NodeJS.Dict' and pull the initialValue from 'options'. If we were to do something like that, then we could also provide the ability to override the maxQueueDepth for the PromiseQueue.

I struggle with this one quite often as putting a Dictionary in there (NodeJS.Dict) is not ideal solution and really goes against strong typing but is a well-established javascript constructor pattern.

Thoughts?

@lynniemagoo
Copy link
Contributor Author

lynniemagoo commented Dec 21, 2023

Hey. Wanted to update you. I put latest promise-queue into my MCP23017 module (maxDepth=3) which is the most sensitive IO chip I have and if you don't read the chip after an interrupt, for sure it will stop sending interrupts. (same goes for CAT9555).
What I saw for PCF8754 was that it was less finicky about not firing interrupts but I know in the past that I'd missed input changes due to rejection.

MCP23017 works beautifully with 0 rejects as we expected and so does PCF8574. I have really exercised the hardware for both PCF8574 and MCP23017 and this these changes are solid.

On a side note, you should have also gotten some emails about being added as a collaborator on some private projects. I did this in transparency and in the MCP and CAT modules, you will see I removed unnecessary semicolons.
I'm really torn as some individuals I work with are religious about no semicolons and 2 space indentations. I kept the 2 space indentation as I wanted to be able to use your es-lint configurations to validate but I did remove the semicolons.

These projects are scoped to my user space in case I do want to publish to NPM. Right now, I'm thinking I want to combine the code into a single project as @LynnieMagoo/automation-expander

@crycode-de
Copy link
Owner

We could use an Interface to describe an options (opts) object as parameter for the constructor. Also we may use an overload to allow the "old style" constructor call to keep backwards compatibility, event it's not strictly needed cause of the mayor version bump.

Great to hear, that all works as expected now! :)

Thank you for the invitation to your projects. I'll have a look at them later today.

No semicolons... uff ... I'm going crazy if there are missing semicolons. 😅
Before releasing v4.0.x I wanted to migrate from the current local eslint config to my packaged eslint config which I'm using in most of my projects now. There are tailing semicolons enforced but this can be overwritten by the local config as well.

@crycode-de
Copy link
Owner

Just an idea...
What do you think about creating a generic i2c-expander module together?
This can combine the current four(?) ICs and maybe get added other hardware too.

The current abstract PCF857x class (with some small modifications) could be used as base for all digital IO ICs.

Soon I'll need something for an MCP3424 (4 channel ADC). This could be added as well using some abstract base class for ADC IDs.

crycode-de added a commit that referenced this issue Dec 21, 2023
Using 3 queued poll requests we can be sure to not miss any change
by interrupt or manually triggered poll. See #59
@lynniemagoo
Copy link
Contributor Author

Whatever you decide on the semicolons is fine. I will follow your lead there as well as the construction pattern in my derivatives. Check out the mcp and cat projects as to how I used you same model with chips supporting register reads

@lynniemagoo
Copy link
Contributor Author

2 of my 3 test setups

image

@lynniemagoo
Copy link
Contributor Author

Mcp23017 and pcf8574

One final suggestion. In the examples, change the default address to 0x20. I've tried in the states to get a PCF8574A from Amazon and every time I find a source with an image of PCF8574A, the always ship boards with PCF8574.

@lynniemagoo
Copy link
Contributor Author

lynniemagoo commented Dec 21, 2023

Ignore this for now - See comment below with PeterStuff3.zip attached.

Hi Peter,

I'm doing some more testing today with PCF8574 and I can confirm that we have resolved the issue with the interrupt rejection. Now, what I'm starting to see are missed interrupts. As I'm receiving interrupts, and processing pin state changes, I'm writing values to the chip as well via I2C. It appears that on the PCF8574, writing to the chip will also clear the need to fire an interrupt.

For example, my pins linked in software as follows:

Pin 7 - Input -> software forwarded to Pin 0 - Output
Pin 6 - Input -> software forwarded to Pin 1 - Output
Pin 5 - Input -> software forwarded to Pin 2 - Output
Pin 4 - Input -> software forwarded to Pin 3 - Output

So what is happening is that as I write updates to

I need to do some more digging into the datasheets but here's what I'm thinking.

      // read from the IC - type specific
      switch (this._type) {
        case PCF857x_TYPE.PCF8574:
          this._i2cBus.i2cRead(this._address, 1, Buffer.alloc(1), (err: Error, bytesRead: number, buffer: Buffer) => {
            // LRM - delay this this._currentlyPolling = false;
            if (err || bytesRead !== 1) {
              this._currentlyPolling = false;
              reject(err);
              return;
            }

            // Readstate is 8 bits.  Pins 0-7 are in byte.
            const readState = buffer[0];

            processRead(readState);
            // LRM - don't clear polling until processed all input pins.
            this._currentlyPolling = false;
          });
          break;

        case PCF857x_TYPE.PCF8575:
          this._i2cBus.i2cRead(this._address, 2, Buffer.alloc(2), (err: Error, bytesRead: number, buffer: Buffer) => {
            // LRM - Delay this this._currentlyPolling = false;
            if (err || bytesRead !== 2) {
              this._currentlyPolling = false;
              reject(err);
              return;
            }

            // Readstate is 16 bit reverse of byte ordering.  Pins 0-7 are in byte 0.  Pins 8-15 are in byte 1.
            const readState = buffer[0] | buffer[1] << 8;

            processRead(readState);
            // LRM - don't clear polling until processed all input pins.
            this._currentlyPolling = false;
          });
          break;
      }
    });

I don't think this will have any affect but with the console.log lines I added for testing, I'm for sure seeing that a read followed by multiple writes is causing an issue.

One other thing I thought about is to guarantee the ordering of things in the queue. For example, instead of having writes go directly to I2C (_setInternalState), have the write request added to the poll queue.

Like I said, I need to do some more testing but these changes have definitely affected timing somewhere and I sometimes end up having usually 1 LED stuck on (Low Level) when I've released all the buttons following a press.

@lynniemagoo
Copy link
Contributor Author

lynniemagoo commented Dec 21, 2023

Ignore this for now - See comment below with PeterStuff3.zip attached.

And another note. I'm not just seeing this on the PCF8574, I'm also seeing this on the MCP23017 which leads me to believe the problem is in the chip poll algorithms which I've standardized or is in my application code which just follows the observer pattern and upon an input pin emitting an event, we write a value to the corresponding output pin.

Will keep you posted over the next few days.

@lynniemagoo
Copy link
Contributor Author

lynniemagoo commented Dec 21, 2023

Ignore this for now - See comment below with PeterStuff3.zip attached.
The more I research, I think the issue may be here.
fivdi/onoff#191
or here:
fivdi/epoll#26

I will continue to experiment. Right now, my delegation simply causes setPin() to be called on the PCF857X for each pin following each interrupt. If 4 pins changed, we do 4 writes, if 3, 3 writes, etc...

The pollQueue is the appropriate solution. Just want to determine if we should also expand pollQueue to also handle writes.

@lynniemagoo
Copy link
Contributor Author

lynniemagoo commented Dec 22, 2023

Ignore this for now - See comment below with PeterStuff3.zip attached.
For now don't worry about queuing writes. I think pcf8574 is good and issue is somewhere else. Finalize your changes on the branch and I will work to fully identify/isolate the other issue.

Happy Holidays.

@lynniemagoo
Copy link
Contributor Author

CODE STYLE ISSUE? - See Line 309 of pcf857x.ts. You are missing a 'break' on the 2nd case of the switch. Do you also want to add 'default:' to your switches?

@lynniemagoo
Copy link
Contributor Author

lynniemagoo commented Dec 22, 2023

Peter,
I started seeing this same problem in my MCP23017 derivative today and was scratching my head as to where the problem was? Was it in my larger project with tiered event listeners and event forwarding or was it someplace else. When in doubt, try to reproduce with the simplest code base.

So, I went back to simple basic modified example.js to do exactly what my larger application is doing but without extra event listeners.
Attached is a zip with README as well as the log. (See Example2.js) which is coded to match the pin setups I called out above.

I'm simply pressing and releasing 4 buttons from pics above at the same time. I am able to reproduce this issue rather quickly. From initial review, I first thought the cause was a single interrupt containing 2 pin changes which causes 2 writes but the 2nd interrupt that occurred has 2 inputs that changed.

So, from all my comments above, ignore all of them for now please except for this one and the one right above regarding code style (missing the break; and default: in switches).

PeterStuff3.zip

This is starting to make me think the issue about missing interrupts, although we did see rejects without the poll queue was a combination of the on-off issue above as well as the reject when actively polling.

If you get a chance, please review the logs in the zip files here. Start with PeterStuff4.zip.
PeterStuff4.zip

I've annotated the example3a failure run. I will admit the problem takes longer to reproduce when we use process.nextTick() to do the pin writes but it is still there.

Here's my final assessment:

Root Cause: Missing GPIO interrupts from epoll underneath onoff.

The pcf857x.ts code is good. The only way to truly solve this problem will be using a combination of interrupt-driven and timer-based polling. My thoughts are to use setInterval polls of 250ms to get things back in sync.

Curious about your thoughts about this...
I'm of the opinion that we simply document that GPIO-based interrupts may not always trigger as we expect them to.

But....
Another possible workaround I was thinking about is when the interrupt is triggered, have the module emit a separate event 'interrupt' and then the application could use this to trigger a poll via setTimeout in 250ms, for example, after the last interrupt which would avoid using setInterval altogether. IMO, this is not a great idea as we are moving the problem into PCF857X code but having this event would allow the user to rely on a single event rather than having to trigger a timed poll when an input pin changes and the 'input' event is emitted.

Let me know your thoughts.

Lyndel

@lynniemagoo
Copy link
Contributor Author

lynniemagoo commented Dec 22, 2023

It's 1:06am here in Texas. I've now just reproduced the same issue on the MCP23017 module. The problem is either 1 of 2 things. 1) The epoll missed interrupt issue or 2) Mechanical switches that are generating too much noise for chips. The inputs are all in pullup mode and the mechanical switches when closed, ground the pin.

I'm of the opinion that this is an epoll scenario as it does not occur regularly and in a repeatable timeframe.

I will make plans in my larger application to simply add an configurable interrupt recovery poll interval that can be adjusted as needed. 500-1000ms should be adequate. I'm testing with 10000ms now and it allows me to see that indeed the timed poll will clear the issues.

The behavior I am seeing for both these chips is that we typically get an interrupt per-pin. So, for the MCP23017, we could in theory get 16 quick interrupts. For PCF8574, we could get up to 8, and for the PCF8575, we could get up to 16 all based on the number of pins on the chip. Hypothetically, the first interrupt would complete way before the last interrupt occurs.

To be 100% sure we are capable of servicing all possible interrupts, I recommend we do the following. Set the promise-queue max depth to the number of pins on the chip (as at the time promise-queue is constructed, we don't know which pins will be used as inputs vs outputs) plus an additional 1 for a manual poll. This is indeed overkill but does limit the memory footprint as we both want and should also support both manual polls as well as interrupt-driven polls.

Getting some rest now. Hope these notes don't overwhelm too much.

Regards,
Lyndel

@lynniemagoo
Copy link
Contributor Author

I guess I lied. I'm still awake. I've done additional testing with maxQueueDepth = 3 and I still cannot get a reject to happen. So my recommendation above is likely unneeded. Let's go with our original agreed value of 3 for now to be able to handle up to 4 polls (1 acttive and 3 in queue).

@lynniemagoo
Copy link
Contributor Author

This may be our issue:

Consider the code from onoff that uses raw epoll events if debounce is not active (which is the case for PCF8574):

  // A poller is created for both inputs and outputs. A poller isn't
  // actually needed for an output but the setDirection method can be
  // invoked to change the direction of a GPIO from output to input and
  // then a poller may be needed.
  const pollerEventHandler = (err, fd, events) => {
    // LRM - If epoll misses sending an event when pin goes low, PCF8574 will never see a transition because it only monitors falling edge.  If when pin goes high again, we won't see the change because we are not looking for rising edge.
    const value = gpio.readSync();

    if ((value === LOW && gpio._fallingEnabled) ||
        (value === HIGH && gpio._risingEnabled)) {
      gpio._listeners.slice(0).forEach(callback => {
        callback(err, value);
      });
    }
  };

The PCF module is only configured for the 'falling' edge. This means we only poll on 'falling' detection when interrupt is low. If we were to configure for 'both', this would mean we'd be queuing 2 promises and doing double reads. But, if we onoff is not sent an event on an interrupt transitioning to low, we'd catch it on the high side when the chip is read and interrupt is cleared.

I will test this later today and get back with my findings and we can discuss if we want to change or allow the rising/falling to be configured.

@crycode-de
Copy link
Owner

Thanks again for your deep investigation! I'll have a deeper look at all this later today.

Your buttons seem not to be hardware debounced. Maybe this causes some of your issues? If you press such a button it will bounce several times what may cause multiple interrupts at the port expander. If the interrupt/poll logic is fast enough you will get multiple input change events or false reads.
Think about something like pressing the button (low) -> interrupt triggers the poll -> the button bounces to high -> read (no change detected since bounced high is read) -> bounce to low -> and so on...
In theory the final bounce to low should trigger an interrupt causing a read as well, but there may be some timings where this isn't the case.

If someone is having trouble using the internal interrupt logic, he could also setup the gpio outside of this module and just call doPoll(). But emitting an interrupt event should be no problem too. I'll add this.

Yeah 16 Inputs may trigger 16 interrupts in theory. But the current queue length should be enough nevertheless since one read can detect multiple changes at once. Event if some interrupts may occur if the queue is full (and so a new poll will be rejected), the final read will get the changes.

@lynniemagoo
Copy link
Contributor Author

lynniemagoo commented Dec 22, 2023

OK. We are 3 days away from Christmas. Let's think this over the next few days before finalizing a decision.

The thing I dislike most about Github is that there is no collaboration tool other than issue notes, etc. IMO, email, zoom, or facetime is better place for these discussions as sometimes lots of trial and error are needed.

However, I want to drop in a 3 notes after sleeping a few hours.

  1. The module in its current form is simple, elegant, and easy to use. Because of this, even I chose it as a model for other chips. The only thing it is really missing is what I call out in 3) below. If we can add this, in addition to the poll queue, we can leave the new code on the branch for now and revisit after the holidays if you are open to that. :-)

  2. After further thought, Adding an 'interrupt' event would indeed allow user to manually call poll but adds no real benefit. Why? Because the user would only have to manually invoke doPoll which is exactly what you stated above and if that were the case, the user could manage their own GPIO and interrupt logic. After testing my MCP module and losing events from epoll or hardware debouncing, there are 2 paths forward that I see.

a) Desired Enhancement - Support 'both' for the GPIO rising + falling which results in 2 polls. additional optional parameter 'gpioEdge: EdgeType' enableInterrupt() with default being 0x00 falling. (enum EdgeType{ falling = 0x00, rising = 0x01, both=0x02} )

b) User manages GPIO externally and calls doPoll().
As user is managing GPIO, they would determine the edge as well.

Option b) might be best but as PCF8574 manages gpio.unexport as well, I'd opt for also supporting Option a) above.

As I stated in 1) above - The module is simple, easy to use. If we could also support a) the user can chose whichever best meets their needs (for me, I would simply change my application layer to enableInterrupts using Both and let the PCF/CAT/MCP modules do the rest).

  1. Strongly Recommended Parallel Behavior - Single Read, Single Write
    Currently, the module support a single read but to update pin states, multiple writes are required.
    If we were to add a New 'read' Event that contains 3 fields;
    value - 8 or 16 bit number - result of I2C read.
    invertBitmask - 8 or 16 bit number (this._inverted)
    inputBitmask - 8 or 16 bit number (this._inputBitmask)

The benefit of such event would allow a user via a single event to manage the chip as a whole (not pin-by-pin). Adding such an event, requires a small change to setAllPins. We would need to accept either a boolean or a number as a single parameter, update the internal state based on values contained within. (by within, I mean honoring the inputBitMask, invertMask, etc).

This would also align I2C write counts with read counts versus the user having to call setPin(pin, value) either 8 or 16 times resulting in 8 or 16 I2C writes.

Let me know what you think!

@lynniemagoo
Copy link
Contributor Author

lynniemagoo commented Dec 22, 2023

Attached here is a zip that includes changes above with the exception of the new event. I defer to your guidance and strengths on the declaration of the event. However, I did implement the change to setAllPins to support both number and boolean types. Hope it meets with your approval.

I'm not too happy about re-exporting type GpioEdge but it seemed the simplest way to do this - Thoughts on how we might do it better?
partial-source-changes.zip

Overall, I think the code changes came out clean. I also ran es-lint to be able to tidy things up for you to review.

I ran my previous example3.js file supplying 'both' for the Edge and I've not been able to reproduce any failures as 'both', although not ideal (requires double reads if see both edges), will adequately handle either the epoll or hardware issues (not sure which it is)

EDIT - No sooner than I'd completed typing this post, I went back to my test harness and even 'both' does not address the issue. I'm still seeing failures. So for now, let's consider only adding new 'read' event and updates to setAllPins to support both number and boolean.

Lyndel

@lynniemagoo
Copy link
Contributor Author

Greetings Peter,

I think I'm finally done for the day. Let me know what you think of this.

Attached are the changes that I propose we release. I've not updated the README nor have I updated examples in the attached but I did test the new 'poll' event locally and it works perfectly.
final-proposal-src.zip

Unzip above over the poll-queue branch src folder, run a diff and you will see my edits.

All this being said, I'm still missing an interrupt every now and then. I suspect that the PCF/MCP is not sending it to the Pi. But, if it is, then it is being lost in epoll.

Either way, anything more we do in this area of interrupt issues is a IMO, a waste of time.

The only thing I can think of now that might be improved because it causes multiple reads/writes during startup are the setInputPin and setOutputPin. Would be very nice if we had a single function like:

configurePins(directionBitmask: number, /* 1 bit indicates input pin, 0 bit indicates output - Matches PinDirection */, 
  invertBitmask:number /* inversion mask to apply*/, 
  value:number /* value to be written to any output pins*/))

// internally, this would setup _currentState and other things followed by I2C write then poll
 

@lynniemagoo
Copy link
Contributor Author

In the zip I attached above, I accidentally did not remote the optional Edge parameter from enableInterrupt. I'm thinking that if we want to leave it in, we need to re-export a type GpioEdge that aliases to the Edge type in the onoff module.

If you don't want to leave it in, then we can revert the import line at the top and then hard-code to 'falling' as was before.

@lynniemagoo
Copy link
Contributor Author

Peter take a look @ my private repo that I added you to. The master branch is promise-based, there's also a promise and a callback branch but Master has what think will benefit us both. I've been testing it and I have some news to share once you are back from the new year.

Lyndel

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

No branches or pull requests

2 participants