Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Binary cache: async push_success #908

Draft
wants to merge 57 commits into
base: main
Choose a base branch
from

Conversation

autoantwort
Copy link
Contributor

@autoantwort autoantwort commented Feb 15, 2023

This results in ~10-20% faster build times on my machine.

For example building boost on my M1 mac went down from 2.948 min to 2.375 min

@autoantwort autoantwort marked this pull request as draft February 15, 2023 20:04
@autoantwort
Copy link
Contributor Author

How or when should "upload messages" (like Uploaded binaries to {count} {vendor}.) be printed?

@Thomas1664
Copy link
Contributor

Doesn't this have the same problem as #694 that the working thread might exit due to calls to check_exit or value_or_exit?

@autoantwort
Copy link
Contributor Author

Doesn't this have the same problem as #694 that the working thread might exit due to calls to check_exit or value_or_exit?

Kind of. In general we need an option to decide if a binary cache failure should be a hard error or only a warning

@Thomas1664
Copy link
Contributor

Kind of. In general we need an option to decide if a binary cache failure should be a hard error or only a warning

The problem is that we almost never can be sure that there isn't some nested API call that exits on failure. But it seems like #909 at least partially addresses this issue.

@autoantwort
Copy link
Contributor Author

Yeah but in the binary cache are nearly no hard exists. It currently also only prints warnings.

Copy link
Contributor

@ras0219-msft ras0219-msft left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I like this direction; unblocking I/O work has great potential for making vcpkg much faster.

However we need to be very careful about the impacts of concurrency -- deadlocks suck :(

include/vcpkg/binarycaching.h Outdated Show resolved Hide resolved
src/vcpkg/install.cpp Outdated Show resolved Hide resolved
src/vcpkg/binarycaching.cpp Outdated Show resolved Hide resolved
src/vcpkg/binarycaching.cpp Outdated Show resolved Hide resolved
src/vcpkg/binarycaching.cpp Outdated Show resolved Hide resolved
include/vcpkg/binarycaching.h Outdated Show resolved Hide resolved
src/vcpkg.cpp Outdated
@@ -156,6 +157,7 @@ namespace vcpkg::Checks
// Implements link seam from basic_checks.h
void on_final_cleanup_and_exit()
{
BinaryCache::wait_for_async_complete();
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I do not think we can do this here. This is on the critical path for ctrl-c handling and should only be used for extremely fast, emergency tear-down behavior (like restoring the console).

If there happens to be an exit anywhere in any BinaryCache implementation, this would deadlock. Importantly, this include any sort of assertion we might want to do, like checking pointers for null.

Unfortunately, the only path forward I see is to call this (or appropriately scope the BinaryCache itself) at the relevant callers. The consequence of possibly not uploading some set of binary caches in the case of some unhandled program error (such as permissions issue on a directory expected to be writable) is vastly preferable to deadlocks.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have changed the BinaryCache::wait_for_async_complete() implementation so it does not deadlock anymore.

I also moved the call to Checks::exit_with_code which is not called when crtl+c is handled. (I personally would like to have a way to terminate vcpkg but wait until the binary cache is done so that I don't lose progress.)

And I prefer it when build packages are uploaded to the binary caches before vcpkg exits because of an error, otherwise I have to build the already build packages again at a later point when there is no cache entry.

Copy link
Contributor

@ras0219-msft ras0219-msft Mar 3, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Agreed that it is, desirable to finish uploading on "understood" errors. For example, if a package failed to build or failed to be installed.

I was also wrong about my original assessment of a deadlock. My concern was the call path of the binary upload thread calling Checks::unreachable() or .value_or_exit(), but it seems that std::thread::join() does have a carve-out to handle this specific case: it will throw a resource_deadlock_would_occur if you try to join yourself.

I've put some other concerns below, but I don't want those to distract from my main point: We must make it as trivial / correct-by-construction as possible to guarantee that the binary cache thread NEVER attempts to wait on itself. I think the best approach for vcpkg right now is to add calls from Install::perform() etc to BinaryCache::wait_for_async_complete() before any "user-facing" error, such as the exit guarded by result.code != BuildResult::SUCCEEDED && keep_going == KeepGoing::NO. This is motivated by the perspective that it's always safer to terminate than to join and possibly deadlock / race condition / etc.


There's still a UB data race if the main thread and binary upload thread attempt to exit at the same time:

Concurrently calling join() on the same thread object from multiple threads constitutes a data race that results in undefined behavior.
-- https://en.cppreference.com/w/cpp/thread/thread/join

There's also a serious "scalability" problem if we ever want a second background thread for whatever reason, because BGThread A would join on BGThread B, while BGThread B tries to join on BGThread A. This might be solvable with ever more complex structures, such as a thread ownership DAG that gets threads to join only on their direct children, but I don't think the benefit is worth the cost.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The UB and the joining itself could simply be prevented by doing a if (std::this_thread::get_id() == instance->push_thread.get_id()). My concern with the explicit approach is that it is easy to forget to call the waiting function of the BinaryCache and every time you want to exit you have to remember to call it. This seems to me to be very prone to human error.

Copy link
Contributor Author

@autoantwort autoantwort Mar 5, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have now implemented your request

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@ras0219-msft Is there anything left that is preventing this PR from being merged?

src/vcpkg/binarycaching.cpp Outdated Show resolved Hide resolved
src/vcpkg/binarycaching.cpp Outdated Show resolved Hide resolved
src/vcpkg/binarycaching.cpp Outdated Show resolved Hide resolved
@autoantwort autoantwort marked this pull request as ready for review March 5, 2023 20:28
# Conflicts:
#	src/vcpkg/commands.ci.cpp
#	src/vcpkg/commands.install.cpp
#	src/vcpkg/commands.set-installed.cpp
#	src/vcpkg/commands.upgrade.cpp
@BillyONeal BillyONeal self-assigned this Oct 4, 2023
Copy link
Contributor

@Thomas1664 Thomas1664 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Mostly nitpicks.
I think we should use size_t instead of int when we're using a variable effectively as an index inside a loop.
Furthermore, I noticed that some variable names are just letters. For me, this is confusing because I don't know what the variable is for a few lines later.


BGMessageSink m_bg_msg_sink;
BGThreadBatchQueue<ActionToPush> m_actions_to_push;
std::atomic_int m_remaining_packages_to_push = 0;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
std::atomic_int m_remaining_packages_to_push = 0;
std::atomic_size_t m_remaining_packages_to_push = 0;

I think this should be size_t because we effectively could have size_t packages.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This member existing at all is a smell to me. The number of items in the queue should be a part of the queue, not something tracked externally. Moreover, the queue already contains locks and stuff so I'm not sure why we need an atomic here.

src/vcpkg/binarycaching.cpp Outdated Show resolved Hide resolved
src/vcpkg/binarycaching.cpp Outdated Show resolved Hide resolved
src/vcpkg/binarycaching.cpp Outdated Show resolved Hide resolved
src/vcpkg/binarycaching.cpp Outdated Show resolved Hide resolved
src/vcpkg/binarycaching.cpp Show resolved Hide resolved
src/vcpkg/base/message_sinks.cpp Outdated Show resolved Hide resolved
Copy link
Member

@BillyONeal BillyONeal left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

  1. I'm going to tackle the structural error reporting etc. thing now that I think this change is really exposing / crying out for. In particular, I think we should consider a design which more formally associates chunks of console output with the package in question rather than a free for all on the message sink type requiring a lot of 'reverse engineer from stuff the caller already knew' kind of behavior.
  2. I think the discussion of how we are achieving thread safety across several large components like this needs the comment like I described in a comment.
  3. The comment about 'BinaryCache is supposed to hide unique_ptr' should get fixed, which will also eliminate a lot of the changes in this PR attempting to adapt to that change.

When I have a solution for (1) are you interested in me just pushing changes into this PR or do you want a PR for your PR?

(By and large I think this change is structurally good)


BinaryCache(const Filesystem& fs);
BinaryCache(const BinaryCache&) = delete;
BinaryCache(BinaryCache&&) = default;
BinaryCache(BinaryCache&&) = delete;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I disagree with this design change. BinaryCache itself already contains several unique_ptr-alikes, and intends to firewall that storage decision from its customers. If there are immovable bits the unique_ptr juggling should happen inside rather than outside.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The comment about 'BinaryCache is supposed to hide unique_ptr' should get fixed, which will also eliminate a lot of the changes in this PR attempting to adapt to that change.

How would you implement that? The function push_thread_main uses the members of the class, which gets moved away if you move the BinaryCache. So I guess using pimpl is the only solution here?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have now put all data in an extra struct that is hold via a std::unique_ptr by the BinaryCache class.

include/vcpkg/binarycaching.h Show resolved Hide resolved
include/vcpkg/base/message_sinks.h Show resolved Hide resolved

BGMessageSink m_bg_msg_sink;
BGThreadBatchQueue<ActionToPush> m_actions_to_push;
std::atomic_int m_remaining_packages_to_push = 0;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This member existing at all is a smell to me. The number of items in the queue should be a part of the queue, not something tracked externally. Moreover, the queue already contains locks and stuff so I'm not sure why we need an atomic here.


bool empty() const { return forward.empty(); }

void pop(std::vector<T>& out)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't like this thing being called queue given that this is how it works. Given that we expect this to be a multi producer single consumer queue, can we instead put the vector inside and note that only one thread may call pop but any number of threads may call push? That would also resolve the criticism over separate tracking atomics below.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

BatchQueue alone is not thread safe.

Given that we expect this to be a multi producer single consumer queue, can we instead put the vector inside and note that only one thread may call pop but any number of threads may call push?

I don't get what you have in mind here 😅

I don't like this thing being called queue given that this is how it works.

Do you have an idea for a better name? :)

Comment on lines +119 to +120
std::lock_guard<std::mutex> print_lk(m_print_directly_lock);
std::lock_guard<std::mutex> lk(m_published_lock);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
std::lock_guard<std::mutex> print_lk(m_print_directly_lock);
std::lock_guard<std::mutex> lk(m_published_lock);
std::lock_guard<std::mutex, std::mutex> lk(m_print_directly_lock, m_published_lock);

if this survives

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think you actually mean std::scoped_lock.

@autoantwort
Copy link
Contributor Author

When I have a solution for (1) are you interested in me just pushing changes into this PR or do you want a PR for your PR?

You can just push to this PR. I will look through the other comments in the following days :)

# Conflicts:
#	include/vcpkg/base/message-data.inc.h
#	locales/messages.json
@autoantwort autoantwort marked this pull request as draft November 12, 2023 02:19
BillyONeal added a commit to BillyONeal/vcpkg-tool that referenced this pull request Dec 21, 2023
…ome clear that we need a thread safe channel through which errors and warnings can both be reasonably reported. Now that microsoft#1279 is landed and functionally everything in the codebase uses ExpectedL, we can look at what the new thing that fixes issues is.

Consider the following:

```c++
    ExpectedL<T> example_api(int a);

    ExpectedL<std::unique_ptr<SourceControlFile>> try_load_port_manifest_text(StringView text,
                                                                              StringView control_path,
                                                                              MessageSink& warning_sink);
```

The reason this can't return the warnings through the ExpectedL channel is that we don't want the 'error' state to be engaged when there are merely warnings. Moreover, that these channels are different channels means that situations that might want to return errors and warnings together, as happens when parsing files, means that order relationships between errors and warnings is lost. It is probably a good idea in general to put warnings and errors about the same location next to each other in the output, but that's hard to do with this interface.

Rather than multiplexing everything through the return value, this proposal is to multiplex only the success or failure through the return value, and report any specific error information through an out parameter.

1. Distinguish whether an overall operation succeeded or failed in the return value, but
2. record any errors or warnings via an out parameter.

Applying this to the above gives:

```c++
    Optional<T> example_api(MessageContext& context, int a);

    // unique_ptr is already 'optional'
    std::unique_ptr<SourceControlFile> try_load_port_manifest_text(MessageContext& context,
                                                                   StringView text,
                                                                   StringView control_path);
```

Issues this new mechanism fixes:

* Errors and warnings can share the same channel and thus be printed together
* The interface between code wanting to report events and the code wanting to consume them is a natural thread synchronization boundary. Other attempts to fix this have been incorrect by synchronizing individual print calls ( microsoft#1290 ) or complex enough that we are not sure they are correct by trying to recover boundaries by reparsing our own error output ( microsoft#908 )
* This shuts down the "error: error:" and similar bugs where it isn't clear who is formatting the overall error message vs. talking about individual components

Known issues that are not fixed by this change:

* This still doesn't make it easy for callers to programmatically handle specific types of errors. Currently, we have some APIs that still use explicit `std::error_code` because they want to do different things for 'file does not exist' vs. 'there was an I/O error'. Given that this condition isn't well served by the ExpectedL mechanism I don't want to wait until we have a better solution to it to proceed.
* Because we aren't making the context parameter the 'success carrier' it's more complex to implement 'warnings as errors' or similar functionality where the caller decides how 'important' something is. I would be in favor of moving all success tests to the context parameter but I'm not proposing that because the other vcpkg maintainers do not like it.
* Contextual information / stack problems aren't solved. However, the context parameter might be extended in the future to help with this.
BillyONeal added a commit to BillyONeal/vcpkg-tool that referenced this pull request Dec 22, 2023
…ome clear that we need a thread safe channel through which errors and warnings can both be reasonably reported. Now that microsoft#1279 is landed and functionally everything in the codebase uses ExpectedL, we can look at what the new thing that fixes issues is.

Consider the following:

```c++
    ExpectedL<T> example_api(int a);

    ExpectedL<std::unique_ptr<SourceControlFile>> try_load_port_manifest_text(StringView text,
                                                                              StringView control_path,
                                                                              MessageSink& warning_sink);
```

The reason this can't return the warnings through the ExpectedL channel is that we don't want the 'error' state to be engaged when there are merely warnings. Moreover, that these channels are different channels means that situations that might want to return errors and warnings together, as happens when parsing files, means that order relationships between errors and warnings is lost. It is probably a good idea in general to put warnings and errors about the same location next to each other in the output, but that's hard to do with this interface.

Rather than multiplexing everything through the return value, this proposal is to multiplex only the success or failure through the return value, and report any specific error information through an out parameter.

1. Distinguish whether an overall operation succeeded or failed in the return value, but
2. record any errors or warnings via an out parameter.

Applying this to the above gives:

```c++
    Optional<T> example_api(MessageContext& context, int a);

    // unique_ptr is already 'optional'
    std::unique_ptr<SourceControlFile> try_load_port_manifest_text(MessageContext& context,
                                                                   StringView text,
                                                                   StringView control_path);
```

Issues this new mechanism fixes:

* Errors and warnings can share the same channel and thus be printed together
* The interface between code wanting to report events and the code wanting to consume them is a natural thread synchronization boundary. Other attempts to fix this have been incorrect by synchronizing individual print calls ( microsoft#1290 ) or complex enough that we are not sure they are correct by trying to recover boundaries by reparsing our own error output ( microsoft#908 )
* This shuts down the "error: error:" and similar bugs where it isn't clear who is formatting the overall error message vs. talking about individual components

Known issues that are not fixed by this change:

* This still doesn't make it easy for callers to programmatically handle specific types of errors. Currently, we have some APIs that still use explicit `std::error_code` because they want to do different things for 'file does not exist' vs. 'there was an I/O error'. Given that this condition isn't well served by the ExpectedL mechanism I don't want to wait until we have a better solution to it to proceed.
* Because we aren't making the context parameter the 'success carrier' it's more complex to implement 'warnings as errors' or similar functionality where the caller decides how 'important' something is. I would be in favor of moving all success tests to the context parameter but I'm not proposing that because the other vcpkg maintainers do not like it.
* Contextual information / stack problems aren't solved. However, the context parameter might be extended in the future to help with this.
BillyONeal added a commit to BillyONeal/vcpkg-tool that referenced this pull request Dec 29, 2023
…ome clear that we need a thread safe channel through which errors and warnings can both be reasonably reported. Now that microsoft#1279 is landed and functionally everything in the codebase uses ExpectedL, we can look at what the new thing that fixes issues is.

Consider the following:

```c++
    ExpectedL<T> example_api(int a);

    ExpectedL<std::unique_ptr<SourceControlFile>> try_load_port_manifest_text(StringView text,
                                                                              StringView control_path,
                                                                              MessageSink& warning_sink);
```

The reason this can't return the warnings through the ExpectedL channel is that we don't want the 'error' state to be engaged when there are merely warnings. Moreover, that these channels are different channels means that situations that might want to return errors and warnings together, as happens when parsing files, means that order relationships between errors and warnings is lost. It is probably a good idea in general to put warnings and errors about the same location next to each other in the output, but that's hard to do with this interface.

Rather than multiplexing everything through the return value, this proposal is to multiplex only the success or failure through the return value, and report any specific error information through an out parameter.

1. Distinguish whether an overall operation succeeded or failed in the return value, but
2. record any errors or warnings via an out parameter.

Applying this to the above gives:

```c++
    Optional<T> example_api(MessageContext& context, int a);

    // unique_ptr is already 'optional'
    std::unique_ptr<SourceControlFile> try_load_port_manifest_text(MessageContext& context,
                                                                   StringView text,
                                                                   StringView control_path);
```

Issues this new mechanism fixes:

* Errors and warnings can share the same channel and thus be printed together
* The interface between code wanting to report events and the code wanting to consume them is a natural thread synchronization boundary. Other attempts to fix this have been incorrect by synchronizing individual print calls ( microsoft#1290 ) or complex enough that we are not sure they are correct by trying to recover boundaries by reparsing our own error output ( microsoft#908 )
* This shuts down the "error: error:" and similar bugs where it isn't clear who is formatting the overall error message vs. talking about individual components

Known issues that are not fixed by this change:

* This still doesn't make it easy for callers to programmatically handle specific types of errors. Currently, we have some APIs that still use explicit `std::error_code` because they want to do different things for 'file does not exist' vs. 'there was an I/O error'. Given that this condition isn't well served by the ExpectedL mechanism I don't want to wait until we have a better solution to it to proceed.
* Because we aren't making the context parameter the 'success carrier' it's more complex to implement 'warnings as errors' or similar functionality where the caller decides how 'important' something is. I would be in favor of moving all success tests to the context parameter but I'm not proposing that because the other vcpkg maintainers do not like it.
* Contextual information / stack problems aren't solved. However, the context parameter might be extended in the future to help with this.
BillyONeal added a commit to BillyONeal/vcpkg-tool that referenced this pull request Jan 26, 2024
…ome clear that we need a thread safe channel through which errors and warnings can both be reasonably reported. Now that microsoft#1279 is landed and functionally everything in the codebase uses ExpectedL, we can look at what the new thing that fixes issues is.

Consider the following:

```c++
    ExpectedL<T> example_api(int a);

    ExpectedL<std::unique_ptr<SourceControlFile>> try_load_port_manifest_text(StringView text,
                                                                              StringView control_path,
                                                                              MessageSink& warning_sink);
```

The reason this can't return the warnings through the ExpectedL channel is that we don't want the 'error' state to be engaged when there are merely warnings. Moreover, that these channels are different channels means that situations that might want to return errors and warnings together, as happens when parsing files, means that order relationships between errors and warnings is lost. It is probably a good idea in general to put warnings and errors about the same location next to each other in the output, but that's hard to do with this interface.

Rather than multiplexing everything through the return value, this proposal is to multiplex only the success or failure through the return value, and report any specific error information through an out parameter.

1. Distinguish whether an overall operation succeeded or failed in the return value, but
2. record any errors or warnings via an out parameter.

Applying this to the above gives:

```c++
    Optional<T> example_api(MessageContext& context, int a);

    // unique_ptr is already 'optional'
    std::unique_ptr<SourceControlFile> try_load_port_manifest_text(MessageContext& context,
                                                                   StringView text,
                                                                   StringView control_path);
```

Issues this new mechanism fixes:

* Errors and warnings can share the same channel and thus be printed together
* The interface between code wanting to report events and the code wanting to consume them is a natural thread synchronization boundary. Other attempts to fix this have been incorrect by synchronizing individual print calls ( microsoft#1290 ) or complex enough that we are not sure they are correct by trying to recover boundaries by reparsing our own error output ( microsoft#908 )
* This shuts down the "error: error:" and similar bugs where it isn't clear who is formatting the overall error message vs. talking about individual components

Known issues that are not fixed by this change:

* This still doesn't make it easy for callers to programmatically handle specific types of errors. Currently, we have some APIs that still use explicit `std::error_code` because they want to do different things for 'file does not exist' vs. 'there was an I/O error'. Given that this condition isn't well served by the ExpectedL mechanism I don't want to wait until we have a better solution to it to proceed.
* Because we aren't making the context parameter the 'success carrier' it's more complex to implement 'warnings as errors' or similar functionality where the caller decides how 'important' something is. I would be in favor of moving all success tests to the context parameter but I'm not proposing that because the other vcpkg maintainers do not like it.
* Contextual information / stack problems aren't solved. However, the context parameter might be extended in the future to help with this.
BillyONeal added a commit that referenced this pull request Dec 7, 2024
* In several PRs, such as #908 and #1210 , it has become clear that we need a thread safe channel through which errors and warnings can both be reasonably reported. Now that #1279 is landed and functionally everything in the codebase uses ExpectedL, we can look at what the new thing that fixes issues is.

Consider the following:

```c++
    ExpectedL<T> example_api(int a);

    ExpectedL<std::unique_ptr<SourceControlFile>> try_load_port_manifest_text(StringView text,
                                                                              StringView control_path,
                                                                              MessageSink& warning_sink);
```

The reason this can't return the warnings through the ExpectedL channel is that we don't want the 'error' state to be engaged when there are merely warnings. Moreover, that these channels are different channels means that situations that might want to return errors and warnings together, as happens when parsing files, means that order relationships between errors and warnings is lost. It is probably a good idea in general to put warnings and errors about the same location next to each other in the output, but that's hard to do with this interface.

Rather than multiplexing everything through the return value, this proposal is to multiplex only the success or failure through the return value, and report any specific error information through an out parameter.

1. Distinguish whether an overall operation succeeded or failed in the return value, but
2. record any errors or warnings via an out parameter.

Applying this to the above gives:

```c++
    Optional<T> example_api(MessageContext& context, int a);

    // unique_ptr is already 'optional'
    std::unique_ptr<SourceControlFile> try_load_port_manifest_text(MessageContext& context,
                                                                   StringView text,
                                                                   StringView control_path);
```

Issues this new mechanism fixes:

* Errors and warnings can share the same channel and thus be printed together
* The interface between code wanting to report events and the code wanting to consume them is a natural thread synchronization boundary. Other attempts to fix this have been incorrect by synchronizing individual print calls ( #1290 ) or complex enough that we are not sure they are correct by trying to recover boundaries by reparsing our own error output ( #908 )
* This shuts down the "error: error:" and similar bugs where it isn't clear who is formatting the overall error message vs. talking about individual components

Known issues that are not fixed by this change:

* This still doesn't make it easy for callers to programmatically handle specific types of errors. Currently, we have some APIs that still use explicit `std::error_code` because they want to do different things for 'file does not exist' vs. 'there was an I/O error'. Given that this condition isn't well served by the ExpectedL mechanism I don't want to wait until we have a better solution to it to proceed.
* Because we aren't making the context parameter the 'success carrier' it's more complex to implement 'warnings as errors' or similar functionality where the caller decides how 'important' something is. I would be in favor of moving all success tests to the context parameter but I'm not proposing that because the other vcpkg maintainers do not like it.
* Contextual information / stack problems aren't solved. However, the context parameter might be extended in the future to help with this.
Comment on lines +98 to +102
std::vector<std::pair<Color, std::string>> m_published;

// buffers messages until newline is reached
// guarded by m_print_directly_lock
std::vector<std::pair<Color, std::string>> m_unpublished;
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@BillyONeal Now that you have implemented the "error document" type stuff, I am right that this should be a DiagnosticLine etc. instead.

Copy link
Member

@BillyONeal BillyONeal Dec 26, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not sure yet, I ran into a problem where I'm not sure exactly how "status" type information should be conveyed through this infrastructure; I'm working on it over here: main...BillyONeal:vcpkg-tool:message-sink-line

As part of that I realized that #1137 touches the same area and would be easier to merge first and I'm doing that right now...

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why does a "status" need to be conveyed here at all? This are only informational messages.
Or what is the problem you are trying to solve here?

Copy link
Member

@BillyONeal BillyONeal Jan 3, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Some messages, like "I am about to touch the network now", are time sensitive and under normal conditions must be printed. However, errors/warnings like "the download failed" might need to be suppressed, if a subsequent retry / alternate makes the overall process succeed.

For example, when downloading a file, in "time order" let's say this happens:

  1. We try to download the file from an asset cache. This should get a timely message because it touches the network.
  2. Attempting to contact the asset cache fails. This normally emits an error.
  3. We try to download the file from upstream. This needs a timely message because it touches the network.
  4. Download from the real upstream succeeds. We need to 'go back in time' and not emit the error from #2. This means we can't print that error when it happens, we have to buffer it until we know it's actually going to happen or not. But we normally must not buffer #1.
  5. We try to submit the freshly downloaded file back to the asset cache. This needs a timely message because it touches the network.
  6. Submitting back to the asset cache fails for some reason. That normally emits an error, but in this context needs to be reduced to a warning because we can continue without it.

If any of this is happening on 'background' threads, even the 'timely' messages need to be held until the next synchronization point with the thread that owns the console. This is what 'statusln' is for in my WIP. They need to share one channel rather than just passing both MessageSink and DiagnosticContext to handle this background case where a caller wants to keep the original-time-order of diagnostics and status.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If any of this is happening on 'background' threads, even the 'timely' messages need to be held until the next synchronization point with the thread that owns the console.

This PR already does this (for the scope of the PR)

background case where a caller wants to keep the original-time-order of diagnostics and status

That is already the case in the PR. But yeah this PR don't know when a message ends (if it spans multiple lines), the whole reason for the error document type stuff to be created. But could this not simply be solved by buffering DiagnosticLines instead and the use of DiagnostigContext instead of MessageSink in this PR, or what breaks then?
Currently: every message (regardless of the type) -> MessageSinkBuffer -> MessageSink
Future: every message (regardless of the type) -> DiagnosticLineBuffer -> MessageSink

Copy link
Member

@BillyONeal BillyONeal Jan 4, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

But could this not simply be solved by buffering DiagnosticLines instead and the use of DiagnostigContext instead of MessageSink in this PR, or what breaks then?

Then 'inner' parts have no way to emit the 'intended to be timely' messages. What I'm doing is:

  1. Add one function, statusln to DiagnosticContext where these timely messages go. Normal buffering of errors won't buffer these, but background thread stuff will. Notably, there is no non-ln version because it needs to be a reasonable buffering point. (Embedded newlines are fine, but callers need to assume there will be a newline there)
  2. Teach downloads.cpp et al. to use DiagnosticContext, which implied teaching system.process.cpp how to use DiagnosticContext.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Then 'inner' parts have no way to emit the 'intended to be timely' messages.

Iiuc 'intended to be timely' = must be printed immediately: The inner parts that don't run in the background can print there stuff immediately via the MessageSink/DiagnosticContext and the stuff in the background thread is never allowed to print stuff immediately otherwise you get interleaved messages with the build output.
So I don't understand why we need this extra "print messages immediately" channel when the only possible states are "print everything immediately" or "print nothing immediately".
Maybe I should just wait and see what your resulting code looks like 😅

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So I don't understand why we need this extra "print messages immediately" channel when the only possible states are "print everything immediately" or "print nothing immediately".

No, there's a third condition, which is step 4 in my example above. We need to buffer errors and warnings from the inner operation, because we may want to swallow and not emit them if a subsequent attempt succeeds, but we must not buffer any of the timely status messages.

Example: https://github.com/BillyONeal/vcpkg-tool/blob/9aa671863a68ef90d0c355e4594bd9925d9df083/src/vcpkg/base/downloads.cpp#L926-L966

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Aah thanks for the example! :)
But iiuc this "problem" is not caused by this PR and already existed beforehand?

Copy link
Member

@BillyONeal BillyONeal Jan 4, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, it has nothing to do with your PR. It's a problem I ran into trying to DiagnosticContext-ize the downloads stuff, which I want to do in order to have confidence that your PR is correct. (This way everything the background thread might touch speaks the 'can be made background thread aware' language)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants