-
Notifications
You must be signed in to change notification settings - Fork 0
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
TLS ABC PEP draft #1
base: master
Are you sure you want to change the base?
Conversation
pep-0xxx.rst
Outdated
class in the ``ssl`` module. | ||
|
||
While it is technically possible to define (2) in terms of (3), for the sake of | ||
simplicity it is easier to define these as two separate ABCs. Impelementations |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
typo “Impelementations”
pep-0xxx.rst
Outdated
class Context(metaclass=ABCMeta): | ||
@abstractmethod | ||
def register_certificates(self, | ||
certificates: str, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
When designing a new API, we should take into account, that people might want to use TLS libs with HSMs and avoid loading keys into Python’s memory.
pep-0xxx.rst
Outdated
TLSBufferObject = Union[TLSWrappedSocket, TLSWrappedBuffer] | ||
ServerNameCallback = Callable[[TLSBufferObject, Optional[str], Context], Any] | ||
|
||
class Context(metaclass=ABCMeta): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is everyone 100% sure we should have only one Context class? Or would it be better to have a ClientContext and ServerContext and make them same for the time being? I don’t really know any other TLS API than OpenSSL but it seems thinkable that a lib might exist that makes a distinction? Also makes docstrings less weird and more concrete.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm definitely happy to have a client context and a server context. It's not clear to me that their APIs would differ at all though. If that's the case, I'm not sure how much value there is in doing it. But I'd like to see additional opinions here.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hrm, actually, there's another good reason to want to do this: some TLS implementations may only want to offer one side of that connection. Like, SecureTransport can technically be a server, but is missing a whole bunch of hooks and functions to be able to do it effectively. So a SecureTransport-based implementation may just want to provide ClientContext
, to signal this limitation.
pep-0xxx.rst
Outdated
both of ALPN or NPN. | ||
|
||
``protocols`` should be a list of ASCII strings for ALPN tokens, | ||
such as ``['h2', 'http/1.1']``, ordered by preference. The |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don’t know a lot about ALPN but strings are terrible interfaces.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ALPN is stringy. There's really no other way around it.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That doesn’t mean our API has to be stringy though, does it? ISTM it’s about a few well-known strings?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The ALPN registry can be regularly updated. I'm really reluctant to put us in a position where we have a well-typed API that requires CPython to keep up with an ICANN registry of tokens.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
How about having one generic type that eats a string? Like:
protocols = [H2, HTTP_1_1, Custom("whatever")]
Having proper objects for well-known protocols seems like a good way to avoid (typo) errors and avoiding normalization (like 'h2 '
or 'H2'
).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I can get behind that I suppose.
pep-0xxx.rst
Outdated
pass | ||
|
||
@abstractmethod | ||
def set_ciphers(self, ciphers: List[Ciphers]) -> None: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
JFTR, we have a fairly generic solution to this in Twisted you may or may not have a look at.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah, so my suspicion is that there are two ways to get generic support here. Probably the easiest, honestly, is to have the list of Ciphers be basically a fancy enumerated type that uses the full, formal names for cipher suites like the ones provided here. Those names should map to their TLS integer, and can then be de-mapped as needed by the individual implementations.
How does that sound?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I have a database of all known cipher suites, their hex ids and aliases for different TLS libs, https://raw.githubusercontent.com/tiran/tlsdb/master/tlsdb.json
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I guess we should use IANA/RFC as the canonical one and offer easy ways to translate including a way to extend so people aren’t ever bound to what’s inside?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah, that would be my guess. Not sure what "easy way to extend" means though: some kind of wacky custom enum?
pep-0xxx.rst
Outdated
until more data is available in the input buffer. | ||
""" | ||
pass | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is this all the errors we need? Any reason why there’s no validation errors? I’d like a whole hierarchy of ValidationError > ExpiredError/ServiceIdentityError/…
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The main reason I haven't defined a more complete exception hierarchy is because I'm reluctant to constrain what applications can provide. We could certainly provide an abstract validation error though, and then allow implementations to provide more specific subclasses if they are able.
Does that sound sensible?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I like to have one common base class for SSL exceptions. In the future ssl.CertificateError
should be come a subclass of ssl.SSLError
.
pep-0xxx.rst
Outdated
* It's annoying that there's no type signature for fileobj. Do I really have to | ||
define one as part of this PEP? Otherwise, how do I define the types of the | ||
arguments to ``wrap_buffers``? | ||
* Do we need ways to disable specific TLS versions? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yes
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah, I thought so.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
OpenSSL 1.1.0 has deprecated all version specific protocols. For Python stdlib I like to move to PROTOCOL_TLS_CLIENT (client side onky method) and PROTOCOL_TLS_SERVER (server side only method) with auto-negotiation of the highest protocol number.
To set minimal and maximum TLS version I like to introduce a set_version_range(self, min=None, max=None)
function. None means minimum supported resp. maximum supported version.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah, so I have no intention of exposing OpenSSL's weird handshake version thing. I literally mean a function call that would translate to setting OP_NO_*
. Your set_version_range
would likely be fine.
pep-0xxx.rst
Outdated
arguments to ``wrap_buffers``? | ||
* Do we need ways to disable specific TLS versions? | ||
* Do we need ways to control hostname validation? | ||
* Do we need ways to disable certificate validation altogether? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I’m afraid so. It’s impracticable to do otherwise. Even I had to disable it in one case in production because third parties.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ok. To do this I have to define some level of optionality on this feature, because some TLS stacks may not be able to have their validation disabled (at least not easily).
pep-0xxx.rst
Outdated
define one as part of this PEP? Otherwise, how do I define the types of the | ||
arguments to ``wrap_buffers``? | ||
* Do we need ways to disable specific TLS versions? | ||
* Do we need ways to control hostname validation? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, I’ve encountered the wildest things in practice. One EPP registry for example uses a service name instead of host name.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ok, so when we say "control hostname validation", what kind of control were you wanting? What kind of interface would that look like?
For example, SecureTransport can only do both SNI and hostname validation, or neither SNI nor hostname validation. So would it be sufficient to have a binary "verify hostname" property or switch, and then a way to extract relevant data from the cert? Or just the cert in DER form?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah I guess. Maybe a callback even?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
A callback might be a bit tricky, but I suppose it's the best way to handle it.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So if we had a callback, does it take an x509 certificate of some kind? A list of SAN fields?
pep-0xxx.rst
Outdated
* Do we need ways to disable certificate validation altogether? | ||
* Do we need to support getpeercert? Should we always return DER instead of the | ||
weird semi-structured thing? | ||
* How do we load certs from locations on disk? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this seems like a subproblem I’ve mentioned before about HSMs. we need a generic solution to this. I hope someone who actually uses HSMs can help us here out.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yup, I think the various HSM questions here I shall defer to either @tiran, @alex, or @reaperhulk, all of whom have vastly more experience with HSMs than I do (which is to say, any at all).
"platform-native" experiences. | ||
|
||
|
||
Problems |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think this section to highlight the fact that the ssl
module is not an abstract API for implementing TLS (heh), but rather exposes OpenSSL implementation details as it's own.
Proposal | ||
======== | ||
|
||
This PEP proposes to introduce a few new Abstract Base Classes in Python 3.7 to |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
And 2.7.x!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Also you probably want to indicate that the ssl
module will implement these intefaces :-)
pep-0xxx.rst
Outdated
TLSBufferObject = Union[TLSWrappedSocket, TLSWrappedBuffer] | ||
ServerNameCallback = Callable[[TLSBufferObject, Optional[str], Context], Any] | ||
|
||
class _BaseContext(metaclass=ABCMeta): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Don't subclass things, it's bad for your health.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This API doesn't contain anything about session tickets/resumption or 0-RTT handshakes for TLS 1.3
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do we want to force that into the API? Or do we want to say that concrete implementations should wrap those details into their ClientContext or ServerContext objects?
pep-0xxx.rst
Outdated
|
||
class _BaseContext(metaclass=ABCMeta): | ||
|
||
@property |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should this really just be a boolean property? I feel like if one was really approaching this API from scratch, a TLS client/server would take a PeerCertificateValidator
, which is it's own interface, which could be implemented in many different ways.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do you have a good example of how one would do that with OpenSSL, or any idea of what that interface would be like? I don't have a good handle on what your mental picture of this interface looks like right now.
pep-0xxx.rst
Outdated
pass | ||
|
||
@abstractmethod | ||
def register_certificates(self, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Register doesn't seem like a great name to me.
pep-0xxx.rst
Outdated
def register_certificates(self, | ||
certificates: str, | ||
key=None: Optional[str], | ||
password=None: Optional[Callable[[], Union[AnyStr, bytearray]]]) -> None: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why are these strings instead of objects?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The certificates? Mostly so I could avoid defining an X509Certificate object that I would then have to define an interface to. I suppose I could bootstrap an interface like that from cryptography.x509
. Accepting your bias regarding that interface for a moment (:wink:), does that sound like a better approach?
pep-0xxx.rst
Outdated
""" | ||
|
||
@abstractmethod | ||
def wrap_buffers(self, incoming: Any, outgoing: Any, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
incoming
and outgoing
can't really be Any
, can they?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't have a good type definition for them, and IIRC the typing module doesn't yet have good support for interface-based typing.
pep-0xxx.rst
Outdated
|
||
class ClientContext(metaclass=ABCMeta): | ||
@abstractmethod | ||
def wrap_socket(self, socket: socket.socket, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why do these APIs implement socket-specific bits at all? It seems like they could be written purely in terms of buffers, and then there could be a single implementation, shared across all TLS implementations, which turns the buffer-API into one that handles sockets.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah, that's unquestionably true. I suppose the main reason was to allow for the possibility that implementations may have their own socket-specific logic that they'd like to use. But I'm not 100% convinced there is value there.
pep-0xxx.rst
Outdated
|
||
@context.setter | ||
@abstractmethod | ||
def context(self, value: Context) -> None: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This seems like a bad API. It's clearly inherited from OpenSSL's approach to SNI (IIRC), but it should not exist in a platonic ideal TLS API.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The context-swapping API? Yeah, I can get behind removing that. I guess the question becomes: how do you request a change to the TLS settings based on the SNI parameter? Should you be able to do that?
pep-0xxx.rst
Outdated
|
||
The definitions of the errors are below:: | ||
|
||
class TLSError(Exception): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This seems like a very imprecise way to specify errors.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yup, it is. But I was worried about getting too specific here and constraining individual implementations. In general I'd suggest we only want to break out errors for things that applications could meaningfully recover from: otherwise they're all equally fatal.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Related discussion happened here: python-tls/tls#112 (review)
pep-0xxx.rst
Outdated
""" | ||
|
||
@abstractmethod | ||
def set_ciphers(self, ciphers: List[Ciphers]) -> None: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
SChannel's cipher configuration is a bit weird in a way that I don't think will work with an explicit list (see palgSupportedAlgs
here).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think it'll still be fine: the SChannel implementation will just need to transform the list into the appropriate "pointer to array".
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The issue is that I'm not sure what the appropriate transform would even be. The ALG_ID
doesn't identify a cipher, but rather individual algorithms. You can turn on or off ECDHE or SHA 256 for example, but not specifically TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Oh god I hadn't noticed that.
I don't think we can come up with a general interface to this without requiring that the SChannel backend work out how to make this work. It should be fairly possible to work out the union of algorithms that can be used for most cipher strings, I think.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I have some thoughts written down on a possible way to support some kind of cipher filtering in a similar project: sfackler/rust-native-tls#4 (comment).
The usefulness does seem to be somewhat limited compared to full cipher selection, but it might still be worth it to be able to do things like disable non-ephemeral key exchange, for example.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yup, so I'd say the alternative here is the same as you outline in your linked issue: construct this enum like SChannel and have OpenSSL and friends do the mapping instead. I think I'm not sure that moving to the lowest-common-denominator here is a good idea, especially given the inertia of what is already present in the system, but I'm prepared to be convinced otherwise.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It looks like Windows's APIs do have the ability to explicitly enumerate cipher suites, but it looks like the configuration is global: https://msdn.microsoft.com/en-us/library/windows/desktop/bb870930(v=vs.85).aspx
attempt to export their system trust stores in some form. | ||
|
||
Relying on `certifi`_ is less than ideal, as most system administrators do | ||
not expect to receive security-critical software updates from PyPI. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
😟
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
😟 indeed.
pep-0xxx.rst
Outdated
continue. Other return values signal errors. Attempting to control | ||
what error is signaled by the underlying TLS implementation is not | ||
specified in this API, but is up to the concrete implementation to | ||
handle. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I assume the intention here is that the callback does the same sslbuffer.context
switcheroo as we see in the ssl
module? If so that should be made explicit (it's not obvious that an implementation needs to support changing SSLContext
objects midstream!). I do wonder if there's a better option, also (is there any case where the new context differs in any way besides which certificate it presents? could we just like... return the certificate instead?).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Also, if you're splitting the context types than this should be part of ServerContext
, not _BaseContext
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
...it looks like Go used to only support switching certificates in their equivalent of this callback, but in the next release is adding a new callback that allows arbitrary configuration to be changed. So... that seems like strong evidence that I'm wrong, and arbitrary reconfiguration is actually important.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So, this is a really good question. We definitely want to support at least swapping SSLContext object mid-stream, but it's worth considering whether a more explicit API would be better. I'm genuinely not sure.
pep-0xxx.rst
Outdated
@abstractmethod | ||
def wrap_socket(self, socket: socket.socket, server_side=False: bool, | ||
auto_handshake=True: bool, | ||
server_hostname=None: Optional[str]) -> TLSWrappedSocket: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Compared to the stdlib module, this drops suppress_ragged_eofs
and session
arguments. The whole suppress_ragged_eofs
thing is a bit weird, but I guess should either be supported or else it should be explicit that this API acts like it was False
. session
seems important; I'm guessing it just got missed b/c the stdlib only added it in 3.6?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Also I guess you should decide whether you want ServerContext
/ClientContext
to be split off, b/c right now you have inconsistent definitions of wrap_socket
and wrap_buffers
in the same document :-).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The real question is whether we need to doff our cap at the ssl
module at all. If it didn't exist, would you care that suppress_ragged_eofs
wasn't on this function?
As to session
, right now I don't think that users should be in charge of session reuse. I think that error exists in the ssl
module API due to historical quirks, and we shouldn't replicate it here.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Well, the suppress_ragged_eofs
thing is clearly not a necessary feature. But in stdlib ssl
, the default is to ignore unclean shutdowns (suppress_ragged_eofs=True
), which is a security problem in some cases. So we should at the least provide an option to act like suppress_ragged_eofs=False
. The other option is to only act like suppress_ragged_eofs=False
. This means everyone implementing common protocols like HTTPS will need to catch and ignore the ragged EOF exceptions, and complicate porting from stdlib ssl
(people are probably depending on suppress_ragged_eofs=True
without realizing it). So... either an option or only-False
seem reasonable to me, but either way I think the spec needs to explicitly say what it's doing :-).
Regarding session
: I thought it needs to be exposed because e.g. servers may need to stick the session ticket in a database? I guess the alternative you're proposing is that each TLS implementation would hard-code some in-memory LRU ticket cache, maybe with some extension to tune the cache size etc.? This is beyond my TLS knowledge depth, just wanted to make sure the issue didn't get missed :-)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So... either an option or only-False seem reasonable to me, but either way I think the spec needs to explicitly say what it's doing :-).
I remain unconvinced of the need to do this up here. I can see wanting to provide an implementers note, but ultimately this seems like a really weird note to give. It's essentially saying: "be aware that the connection may be terminated in an unclean manner, and that may be treated as an error condition". This document really shouldn't become a "this is how TLS works" primer.
We absolutely should consider adding a section ("Porting Notes"?) that covers gotchas that may be encountered when moving from the stdlib ssl
module to this one. But I don't think annotating each method that differs from stdlib ssl
in any minor way about exactly how it differs is a sensible way of presenting the documentation of the interface.
For session
, the big risk here is that in an abstract API persisting session information is a very tricky thing to do. This is because it is clearly not enough to persist an abstract session ID, as that would provide little value across different processes. You need to persist all relevant information about the Session, and what exactly that data is and how it gets turned into a Session that a concrete implementation can restore is extremely esoteric. Essentially, if you're using the generic API, the idea of doing cross-process session caching is something you need to give up on, because you simply cannot meaningfully assume that your OpenSSL-based session can be quietly inserted into a GnuTLS-based process.
This is a problem for concrete APIs, and they can address it directly in their own implementations.
pep-0xxx.rst
Outdated
no behaviour. They exist simply to signal certain common behaviours. Backends | ||
should subclass these exceptions in their own packages, but needn't define any | ||
behaviour for them. These exceptions should *never* be thrown directly, they | ||
should always be subclassed. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
...Why is throwing them directly a problem? I mean, WantWrite
is WantWrite
, there's not much more to say...
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The short answer is because I'd like people to behave like these classes are abstract, even though they aren't. =P
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
(But seriously, in order to be caught these exceptions will have to live in some concrete well-known package and implementations will have to use exceptions that inherit from that well-known package's concrete implementations, which is all totally fine but I don't see much benefit in pretending they're abstract?)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That's fair enough. I'd say the other reason to encourage the throwing of subclasses is to clarify what package is in use when an error occurs, but we can change the prohibition to a strong encouragement to do that instead.
Am I correct that as currently written, this API doesn't have any way to do clean shutdown, so it can't be used for protocols that are vulnerable to truncation attacks? (In the stdlib IIUC the |
@njsmith That's correct. Probably worth adding an optional abstract method to allow unwrap. |
pep-0xxx.rst
Outdated
|
||
class TLSWrappedBuffer(metaclass=ABCMeta): | ||
@abstractmethod | ||
def read(self, len=1024: int, buffer=None: Any) -> Union[bytes, int]: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It kind of feels a bit weird to have this sort of change based upon a buffer or not. How about two functions?
class TLSWrappedBuffer(metaclass=ABCMeta):
@abstractmethod
def read(self, len=1024: int) -> bytes:
pass
@abstractmethod
def read_into(self, buffer: Any, len=1024: int) -> int
pass
Also I don't like the len
name here since it shadows len()
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Two functions is fine.
""" | ||
|
||
@abstractmethod | ||
def shutdown(self) -> None: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I assume this is to take place of the write_eof()
function in MemoryBio
, but my question then becomes should we also expose a is_shutdown
boolean?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm not sure. What value would it provide?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
CPython's ssl module has a shutdown method with a different meaning. It is used to shut down a connection on socket level, too.
https://docs.python.org/3/library/ssl.html#ssl-sockets
https://docs.python.org/3/library/socket.html#socket.socket.shutdown
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Well I was just looking at the current MemoryBio
implementation it had a property to determine if the buffer has shutdown or not. I've never actually used the memory bio so I don't know, but presumably it'd be useful to prevent getting an exception if you've already closed the buffer or as a signal for other things that this is finished. I could imagine something like:
tls_buffer = ...
while not tls_buffer.is_shutdown:
data_to_write = a_write_queue.get(block=True)
if data_to_write is AShutdownSigil:
tls_buffer.shutdown()
else:
tls_buffer.write(data_to_write)
There are other ways to write this of course (a while True
loop with a break
comes to mind) but it feels useful to me?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
CPython's ssl module has a shutdown method with a different meaning.
True. I should note that this shutdown
isn't on the socket object, it's on the buffer object. The socket object does not define a shutdown
. That said, I can see that there is potential for confusion here.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There are other ways to write this of course (a while True loop with a break comes to mind) but it feels useful to me?
Useful, sure. But ultimately, I want to balance that utility against the size of this API. Each extra thing that goes into it is more code that we have to convince people to write and support. I'm not sure the utility of this justifies adding it.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Gotcha! It confused me already. :)
To increase general level of confusion, let's not forget that ssl.SSLObject
implements unwrap
to call shutdown
of _ssl._SSLSocket
, which internally calls SSL_shutdown()
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
write_eof
is probably a slightly better name?
Also, this is effectively a variant of write
, so it should be documented to potentially raise the same exceptions, I think?
pep-0xxx.rst
Outdated
|
||
:: | ||
|
||
NextProtocol = namedtuple('NextProtocol', ['name']) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This feels like it should be an enum.Enum
, like:
import enum
class NextProtocol(enum.Enum):
H2 = b"h2"
H2C = b"h2c"
HTTP1 = b"http/1.1"
WEBRTC = b"webrtc"
C_WEBRTC = b"c-webrtc"
FTP = b"ftp"
STUN = b"stun.nat-discovery"
TURN = b"stun.turn"
I would still allow passing arbitrary bytes though.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah, can't hurt.
pep-0xxx.rst
Outdated
:: | ||
|
||
class TLSVersion(Enum): | ||
MINIMUM_SUPPORTED |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I feel like this is a bit of a footgun. Maybe it would make sense to also add a SECURE_MINIMUM
and have that be the default value for TLS version minimum versions?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I am very nervous about doing that. Adding anything with "secure" in the name basically forces python-dev into a position where they are once again tasked with making security-critical updates to a package. I would vastly prefer that this module didn't even pretend that it was making security decisions for you, such that we don't have to sell the idea of regularly updating the module.
We can still have a SECURE_MINIMUM
that is the default, it just shouldn't be the exposed. While I'm here I should note that create_default_context
could easily operate on the abstract context type, which would give us a way to enforce this in one place.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think that pretending that python-dev
isn't going to be in a position where they are tasked with making security-critical updates to a package is the same kind of short-sightedness that lead to the ssl
module not getting updated for new security considerations ever because it was possible to use it securely. We're going to have a default minimum and it's (hopefully) going to be a "secure" minimum so we're going to need to update it regardless of if we expose it or not. All exposing it does is allow someone to revert back to a secure minimum without having to hardcode that in their application.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm definitely opposed to publicly exposing a "secure minimum" because the definition of secure is too flexible with regard to TLS.
However, what's the scenario where a user would legitimately want to ask for MINIMUM_SUPPORTED
without being capable of typing TLSv1
(or, maybe for evil fuzzing purposes, SSLv3) themselves?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So I can probably warm up to removing MINIMUM_SUPPORTED
. It's mostly just there for symmetry right now.
pep-0xxx.rst
Outdated
""" | ||
Creates a Certificate object from a byte buffer. This byte buffer | ||
may be either PEM-encoded or DER-encoded. If the buffer is PEM | ||
encoded it *must* begin with the standard PEM preamble (a series of |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What about the fairly common case where you have multiple certificates and a private key in the same file? In this case the private key might be first.
pep-0xxx.rst
Outdated
@abstractclassmethod | ||
def from_buffer(self, | ||
buffer: bytes, | ||
password=None: Optional[Union[Callable[[], Union[AnyStr, bytearray]], AnyStr, bytearray]) -> Certificate: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think you meant to have this return a PrivateKey
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yup.
pep-0xxx.rst
Outdated
""" | ||
Creates a PrivateKey object from a byte buffer. This byte buffer | ||
may be either PEM-encoded or DER-encoded. If the buffer is PEM | ||
encoded it *must* begin with the standard PEM preamble (a series of |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Same question as above, what happens in the common case of certificates + private key in the same file, the certificate might come first?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good question. I'm not honestly sure. I'm not even honestly sure that we care that much.
The risk is that simply defining a function that takes a path/buffer and returns a list of Certificate and a PrivateKey forces our Certificate and PrivateKey objects to become concretized: that is, they need to be backed by actual X509 objects, because they can't just all point at the same file anymore. This hurts some possible optimisations (e.g. having Certificate objects actually just be a wrapper to the path to the cert for OpenSSL).
Ultimately, I think this "common case" is actually an OpenSSL special (I don't know that other backends support this, and some definitely don't e.g. SecureTransport), so I'm inclined to say that it doesn't belong in the generic implementation.
pep-0xxx.rst
Outdated
private key. It will only be called if the private key is encrypted | ||
and a password is necessary. It will be called with no arguments, | ||
and it should return a string, bytes, or bytearray. If the return | ||
value is a string it will be encoded as UTF-8 before using it to |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This feels kind of wrong to me. Should we just mandates a bytes
or bytearray
? If the calling code has a str
it's pretty trivial for them to do a value.encode("utf8")
on their own without special casing utf8 here (which could possibly do the wrong thing if someone meant to say, encode it in shift-jis or so).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The failure mode for getting this wrong here is really obvious and causes no threat. I don't think it hurts us to just tolerate this easy mistake.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think the failure mode is entirely non-obvious. It'll get exposed as your password not working and I think the typical end user is going to first double check they typed it correctly then they're going to try it in other applications and see it works there. I don't think it's obvious at all that they accidentally passed a str
instead of a bytes
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
They're going to try it in other applications that will encode it using SHIFT-JIS? Really? What application is that?
Are there in fact any applications that encode a password in any encoding other than UTF-8, Latin-1, or ASCII?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
My assumption is that openssl isn't going to do any encoding at all and is going to just pull bytes off of stdin (or whatever). If their system is configured to use SHIFT-JIS then that will work, and Python will happily read()
that into a str
by default.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
To be clear, I mean the openssl
CLI, but I assume the C api doesn't either.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So the error case here is that the user has a non-UTF8 or ASCII locale, has typed a password using non-BMP characters, and then uses sys.stdin.read()
or the equivalent?
I am extremely unsure that that error case will ever come up, but I guess it's not literally impossible either.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@Lukasa As I understand, and I could be wrong, using Shift-JIS is popular in Japan and the users who use it tend to have their locales setup for it. It doesn't seem that far out of line to assume that someone in Japan, using a Shift-JIS locale might use a Japanese letter in their password. For example:
PASSWORD = "学校"
assert PASSWORD.encode("utf8") == PASSWORD.encode("shift-jis")
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's ok, I've already conceded this point. ;) It's in my task tracker as a TODO to remove this.
pep-0xxx.rst
Outdated
@abstractclassmethod | ||
def from_file(self, | ||
path: Union[pathlib.Path, bytes, str], | ||
password=None: Optional[Union[Callable[[], Union[AnyStr, bytearray]], AnyStr, bytearray]) -> Certificate: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
As above, I think you meant to return a PrivateKey
here.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yup.
pep-0xxx.rst
Outdated
Cipher Suites | ||
~~~~~~~~~~~~~ | ||
|
||
Todo |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Two ideas:
The OpenSSL cipher spec is not entirely unreasonable here, maybe we could make a few objects that basically implement that, but in a typed way. So that instead of:
ciphers = "ECDH+AESGCM:DH+AESGCM:ECDH+AES256:DH+AES256:ECDH+AES128:DH+AES:RSA+AESGCM:RSA+AES:!aNULL:!MD5:!DSS"
You would get
ciphers = (
(
((Ciphers.ECDH + Ciphers.AESGCM) & (Ciphers.DH + Ciphers.AESGCM)) |
((Ciphers.ECDH + Ciphers.ChaCha20) & (Ciphers.DH + Ciphers.ChaCha20))
) &
(Ciphers.ECDH + Ciphers.AES256) &
(Ciphers.DH + Ciphers.AES256) &
(Ciphers.ECDH + Ciphers.AES128) &
(Ciphers.DH + Ciphers.AES) &
(Ciphers.RSA + Ciphers.AESGCM) &
(Ciphers.RSA + Ciphers.AES) &
~Ciphers.aNULL &
~Ciphers.eNULL &
~Ciphers.MD5 &
~Ciphers.DSS
)
The above has a quick sketch of the possibility to define "equivalent" ciphers through the use of grouping and the |
statement. Obviously if the implementation doesn't support "equivalent" (do any of them? I know OpenSSL doesn't but I really wish it did...) they could just treat this as a flat list.
The other idea I had is using something like enum.Flag
.
...Also, I looked into NSS a little bit, and it seems like it could support buffer-mode just fine? It wants to work on a In fact AFAICT you have to do something like this anyway if you want to implement the |
I mean, I think the difficulties are pretty minor. It's a trivial runtime check in the library/application: I am not proposing that we wouldn't implement the wrapped buffers for SecureTransport, OpenSSL, SChannel and friends. I am just proposing that we wouldn't go so far in this PEP as to say "and here's a concrete TLSWrappedSocket object that operates in terms of TLSWrappedBuffer". We could unquestionably provide such code in supporting documentation, and provide encouragement to users to implement in that way, but it does not necessarily seem prudent to assume that we want to provide a concrete implementation in the standard library. Though again, I'm willing to be talked around on this issue. As to the keeping track of NSS thing, that's not an enormous surprise: I'll have to do the same for SecureTransport. |
OK, right, for me the potential value-add would be if we could drop
Go here and search for the string "changed in", then consider that if this PEP had been active at the time, then each of those lines represents a separate change that you would have to evaluate, and then potentially update the PEP and coordinate with all implementations to get the changes in... (Concrete example: currently socket objects have a relative timeout the applies to each call individually. Golang sockets have a way better idea, which is to have an absolute deadline after which calls start timing out. Python really should steal this feature. But this PEP as currently written would make it substantially harder to improve the Certainly providing a piece of common code for implementations to use could potentially help mitigate this. But there's still a huge difference in principle between "a piece of code in the stdlib maintained by python-dev and updated on their cycle" and "a formal specification with multiple implementations across different projects, some of which might happen to share some code". Writing a good spec is hard, or put another way, specs are a problem. Sometimes they're a necessary evil, but we should try to make them as small as possible :-). OTOH, as far as I know the only advantages we get from putting I get the feeling that you perceive some additional advantages to |
|
||
As at any time a re-negotiation is possible, a call to | ||
``readinto()`` can also cause write operations. | ||
""" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Speaking of minimizing spec surface area – read
and readinto
seem redundant, since read
can be straightforwardly implemented in terms of readinto
. Is there some significant performance advantage to supporting both, or would it make sense to drop one?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Perversely, readinto
is not always trivial to implement in a useful way. In particular, CFFI frequently struggles with doing readinto
because it is often reluctant to allow you to pass a buffer object into C code thanks to PyPy's GC. With many CFFI tools, the reverse is what's done: readinto
is implemented in terms of read
.
And given that it is really easy to write both by just writing one and then implementing the other in terms of it, there is basically zero cost in specifying that the implementation must provide both, rather than requiring every implementer in the world who wants read
instead of readinto
to write their read
function. =)
So I can totally understand the desire to want to write a single TLSWrappedSocket implementation. My biggest concerns here are that in some cases we may be introducing dramatically more complexity to the stack by requiring support for buffers rather than allowing objects to just support sockets if they want to. Returning to NSS, while I believe it is possible to write an NSS implementation that works on buffers, I'd be disinclined to call it "easy" until I've seen one. However, it's always going to be easier to write an NSS implementation that wraps a socket because that's the default mode in which it is deployed by almost all applications. The same is true for SecureTransport: while it's definitely possible to implement it in terms of wrapped buffers, it's not trivial, in part because ST has a few assumptions predicated on the idea that the read/write callbacks will talk directly to FDs instead of to buffers. All of these can be coped with, of course, but they're non-trivial. Regardless, I'm going to keep tweaking this ST prototype I'm working on to see if we can get to a good place with the wrapped socket to use as a discussion point. |
I'm not sure which way I'm going to advocate here, but, one question is: Is providing a sans-IO layer a required part of compliance this specification, or only optional? If blocking is required but sans-IO is optional, then that's explicitly relegating asynchronous I/O to a second class citizen. (Which, granted, libnss has totally done; but, as I understand it, that's one of the things that derailed various attempts to get rid of OpenSSL in favor of it). If sans-IO is required, then what's the point in doubling implementation effort for every provider? For many backends, the "wrap a socket" library is the more heavily exercised code path, and may expose a superficially more convenient API; however, if an in-memory implementation is possible, then eschewing the native (potentially buggier, edge-case unfriendly, needlessly different in behavior) native implementations and having one common Python one seems like the way to go. |
Neither is required: both are optional. It wouldn't be very helpful to provide neither, of course. ;) In practice, I think the stdlib solutions would all choose to do both. |
Ok, so here is what I think is my suggestion for a compromise position. Why don't we keep the ABC for the socket, but also provide a concrete implementation that backends can use if they see fit? That way backends that want to do their own thing are free to do so, but backends that are happy to have their I/O implemented in terms of a Python socket object can get that done as well. I think I've convinced myself that I'm happy with the ability to write such a socket by playing about with SecureTransport, so I'm open to moving forward with this PEP by simply noting that such an implementation will be provided as a part of the implementation of this PEP. The major people who have expressed an interest in this are @glyph, @njsmith, and @tiran. How does that compromise position sit with the three of you? |
That's what I thought the original position was :). I'll try to clarify my own position: I think this is a great opportunity to send the message that it's not the TLS library's job to do I/O, that this is the sort of task best delegated to a high-level language rather than C, and avoid a bunch of potential bugs in the process. Synchronous I/O should be the second class citizen, not because I prefer asynchronous (surprise, I do) but because synchronous I/O is inherently an abstraction on top of a sans-IO API. However, I understand that this message is somewhat "against the grain", that some libraries might be insufficient to the task. Worth raising at this point, I think, is s2n. While it is my understanding that My perspective is that the clear message should be "s2n is insufficient for a python backend at this time, get on fixing 358 and when it has an abstraction layer it can be integrated". The key question, though, is whether this work wants to be prescriptive or descriptive: if it's designed to record the state of the art without too much editorializing, then providing both ABCs is necessary. If it's an opportunity to promote better design (which, I should stress, almost all TLS libraries are already capable of achieving even if it's the road less traveled) then we should only provide the one. |
I think I see where you're coming from here, but I've been trying to think through the cases and I'm not convinced that it's true. So first, there are cases like s2n (thanks @glyph!) that must do their own I/O. If we require support for buffers, then these libraries simply won't be supported by the standard Python TLS interface. This is somewhat unfortunate, but it definitely doesn't increase the complexity of anything – libraries that don't exist are always simpler than libraries that do :-). And I agree with glyph that there's no need to be held hostage by such libraries; the vast majority of TLS implementations do support buffers, including all the ones we want in the stdlib, and it sounds like s2n is planning to add this soon anyway. Next, there are cases like SecureTransport where you're saying that supporting buffers is non-trivial but doable. The SecureTransport backend has to support the sans-IO API though, because otherwise Twisted can't do TLS on MacOS – that's a feature-driven decision, it's irrelevant what the PEP says. So really the two options here are: either SecureTransport supports the buffer API and then uses generic code to convert that into a wrapped socket, or else SecureTransport supports the buffer API and also has its own private Finally, there are cases like NSS: it probably could support buffers, but if it's supported at all it'll be by some third-party wrapper on PyPI, and maybe the author of that wrapper happens to not care about Twisted and only needs the wrapped-socket support. If so, and if the PEP allows them the choice of between implementing the sans-IO API and then using a concrete socket wrapper, or else implementing the socket wrapper directly, they can pick whichever one makes life easier. And arguably:
But I'm not so sure this is true? Sure, if we just look at the size of the code implementing the core talking-to-NSS functionality, then obviously the version that uses NSS's existing socket APIs will be simpler than the version that implements a buffer interface. But that's not the whole story. For the buffer interface, once you implement the handful of methods described in the PEP, you're done. For the socket-wrapper, you then need to reimplement the entire Python socket API from scratch on top of the NSPR abstractions. This is seriously non-trivial, because the Python socket API is a beast: >>> len(dir(ssl.SSLSocket))
>>> 89 And then there's one more source of complexity to consider: realistically, you're not going to perfectly emulate the Python socket API; there's going to be weird corner cases you get wrong. (Like, "when this particular set of events occurs with this timing then in the regular wrapper implementation you get an OSError with errno X but in this implementation you get errno Y instead", that kind of thing.) So we also should count the complexity cost of downstream users who have to discover and come up with workarounds for these kinds of issues. In the buffers-only approach, this isn't an issue, because there's only one wrapper implementation that downstream users have to test against. So taking a holistic perspective and just looking at complexity alone, I don't see any cases where allowing backends to skip implementing the buffers interface is a clear win. |
@njsmith I feel like you wrote a very long post that didn't address my most recent suggestion, which is to keep the TLSWrappedSocket ABC but also provide a concrete default implementation. I understand and sympathise with @glyph's argument here, and but for NSS I think I'd be willing to bite. But I think NSS is a use-case we need to support: it's fairly widely deployed, and is useful to get stakeholder support. What I want to know is: how much does it hurt to do that? If the answer is "a lot" then I'm inclined to want to give NSS an escape hatch, whereas if the answer is "only a bit" then I'm much more willing to say "oh well, nvm then, we can make NSS contort to fit". I'm reluctant to be too ideological here: this PEP is ultimately about pragmatically attempting to solve the problem of Python distribution rather than a treatise on good system design, and so I'm entirely happy to compromise on the sans-IO approach in order to build a bigger tent. s2n doesn't bother me (it's not heavily used), but NSS does, and so if getting NSS to work with buffers is going to double or triple the workload required to get it done then I'm disinclined to impose that cost. I think someone needs to step up to investigate @njsmith's suggestion for now to make NSS be sans-IO'y, and to see if it actually works. |
I have to agree with Cory here: while I love sans-IO and async, I really hope we can keep these ideological considerations out of this PEP. Realistically speaking, most people care only about I/O-based wrappers (at least for now). Forcing an interface on them that treats this approach as a second class citizen (which usually means a performance penalty too) would just undermine the relevance and adoption of this PEP. And I really want this PEP to succeed for it seems like a viable way of Python’s valley of TLS sadness. IOW: define both, mandate neither. Let people pick what’s best for them and their TLS implementations. |
I'm even inclined to go a bit further than what @hynek wants and to provide a socket-in-terms-of-buffer object, and encourage implementations that want to to use it, but to just not slam the door in the face of those that want to just provide the wrapped-I/O approach. |
Having reviewed the NSS docs a bit now, I can't say for sure, but it seems like there are two answers. From the perspective of "how hard is it for someone who doesn't really know NSS to do", all of the example code for NSS does use blocking sockets, and so that would be the easiest to find some sample to crib from. However, NSS depends heavily on NSPR, and NSPR has a robust and well-documented I/O abstraction layer. If you have a good grasp of how these layers interact with each other, then doing the work to put together a buffered I/O abstraction would be only very slightly more difficult (as you'd have to implement a few callbacks to populate your buffer). So while it's a little harder to find an example, the implementation work is nearly equivalent. |
@Lukasa: I'm confused about why you're confused :-). My understanding is: you propose that we allow (but don't require) backends to take full responsibility for implementing a socket wrapper object, and make the buffer-based interface optional. Your main motivation for this is that for backends like NSS (or maybe... just NSS), you think this might be less complex than requiring them to implement the buffer-based interface. My concern about this isn't ideological; it's that the Python socket interface is extremely complicated and constantly evolving, so in practice backends are going to get it wrong and get out of sync with the stdlib, and user code is going to get stuck debugging broken socket APIs. As Glyph can tell you, socket APIs are already a mess to work with without us adding more broken wrappers :-). Plus, it provides only partial functionality compared to the buffers approach. So I think we should only allow this (even as a "compromise") if we really are getting a benefit from it. And you claim that benefit is in terms of complexity, so my message was an analysis of complexity :-). And the main point is that I disagree with your assessment: AFAICT, even NSS actually isn't going to be easier to implement a socket wrapper directly. Sure, there are better examples for how to use the NSS API in the basic direct-blocking-I/O mode, but that's only half the battle. If you aren't going to use the buffers-to-socket-wrapper adapter, then you're also stuck implementing the whole Python socket API from scratch. Those example don't show how to use the NSPR API to correctly emulate Python's semantics for Am I correctly understanding your argument? If so, does that make sense as a critique? |
Yes, and yes. What I remain unsure about is whether we really want to go so far as to say "and therefore we will not provide a wrapped socket ABC but only a concrete implementation". I buy the argument that says "wrapping sockets is hard, don't do it yourself", but I don't know that I buy the extension that says "and therefore we will not spec it". Still, perhaps I'm just wrong on this and haven't fully internalised that yet. |
Ok, I'm going to put aside my own misgivings for a moment. I've reworked the PEP to include the concrete wrapped socket. I'm going to run this past the security-SIG in that form once more to sanity-check before I take it to python-dev. |
✨ to @Lukasa for doing all this work so far. |
Regarding the problem of making a buffer-based wrapper for NSS, I spent a few minutes grepping the code. Figured I might as well right it down here in case it saves someone some time in the future... As far as I can tell, these are the socket callbacks that you have to implement (i.e., the ones that NSS will actually call at some point during normal usage):
There also appears to be some code that blocks the handshake while it goes off to do an OCSP check, but I believe that's disabled by default. So you have to implement the buffering logic and of course all the TLS configuration and such, and then beyond that there are a few extra fiddly bits, but it doesn't look too bad. |
This pull request tracks my work on the PEP for the TLS ABCs. This PEP is focused on defining a limited interface that can be implemented by any TLS backend for Python, allowing Python network implementations that don't have specific low-level requirements to obtain support for multiple TLS backends.
I've put this up as a GitHub pull request to allow anyone who is interested to provide feedback before I post this PEP to python-dev. My intended flow for this PEP is as follows:
Anyone who likes may leave inline comments on this diff. Shout if you'd like to be able to use GitHub's review feature and I can add you as a collab on this repository to make it a bit easier.