Directory Documents¶
The bushel.directory.document
module provides base classes and utility
methods for handling documents that implement the Tor directory protocol
version 3 meta format (§1.2 [dir-spec]).
For specific document types, see:
-
class
bushel.directory.document.
DirectoryCertificate
(raw_content)[source]¶ A Tor Ed25519 certificate as specified by [cert-spec]. It is not the only certificate format that Tor uses. Typically these are found as the data contained within
DirectoryDocumentObject
s.Parameters: raw_content (bytes) – raw certificate contents
Variables: - data (bytes) – raw certificate contents
- version (int) – version of the certificate format (currently always 1)
- cert_type (int) – type of certificate
- expiration_date (datetime) – expiration date of certificate
- cert_key_type (int) – type of certified key
- certified_key (bytes) – an Ed25519 public key if cert_key_type is 1, or a SHA256 hash of some other key type depending on the value of cert_key_type
- n_extensions (int) – declared number of extensions
- extensions (list(DirectoryCertificateExtension)) – parsed extensions
- signature (bytes) – certificate signature
-
is_valid
()[source]¶ Checks that the certificate is valid. This is the counterpart to
verify()
that checks that the certificate data conforms to the specification. The two checks performed are:- expiration date is not passed
- there are no extensions that affect validation that we do not understand
Note
In the Tor Metrics use case, we need to check that certificates were valid at the time they were expected to be valid, but the current API does not support this.
-
parse
()[source]¶ Parses the certificate to make the fields available via instance attributes. This does not validate or verify the certificate, but must be called before making calls to
is_valid()
orverify()
.
-
verify
(verify_key_data=None)[source]¶ Verify the certificate using the verification key. Optionally provide key material, otherwise the key found in the “signed-with-ed25519-key” (type 4) extension will be used.
This only verifies the signature. To validate the certificate data the seperate
DirectoryCertificate.is_valid()
method must be used.Warning
This verifies the raw data that the object was initialized with, the fields may have been played with since parsing and the parser may also have unknown bugs.
Parameters: verify_key_data (bytes) – an Ed25519 verification key
-
class
bushel.directory.document.
DirectoryCertificateExtension
[source]¶ A Tor Ed25519 certificate extension as specified by [cert-spec].
Variables: See also
These will be found in
DirectoryCertificate
s.
-
class
bushel.directory.document.
DirectoryDocument
(raw_content)[source]¶ A directory document as described in the Tor directory protocol meta format (§1.2 [dir-spec]).
Parameters: raw_content (bytes) – raw document contents -
tokenize
()[source]¶ Tokenizes the document using the following tokens:
Kind Matches on Value END "-----END " Keyword "-----"
Keyword BEGIN "-----BEGIN " Keyword "-----"
Keyword NL The ascii LF character (hex value 0x0a) Raw data PRINTABLE Printing, non-whitespace, UTF-8 Raw data WS Space or tab Raw data MISMATCH Anything else (likely binary nonsense) Raw data Note that these tokens do not match the non-terminals exactly as they are specified in the Tor directory protocol meta format. In particular, the PRINTABLE token is used for both keywords and arguments (and object data). It is up to whatever is processing these tokens to decide if something is valid keyword or argument.
>>> document_bytes = b'''super-keyword 3 ... onion-magic ... -----BEGIN ONION MAGIC----- ... AQQABp6MAT7yJjlcuWLDbr8A5J8YgyDh5SPYkLpj7fmcBaFbKekjAQAgBADKnR/C ... -----END ONION MAGIC----- ... ''' >>> for token in DirectoryDocument(document_bytes).tokenize(): ... print(token) # doctest: +ELLIPSIS DirectoryDocumentToken(kind='PRINTABLE', value='super-keyword', line=1, column=0) DirectoryDocumentToken(kind='WS', value=' ', line=1, column=13) DirectoryDocumentToken(kind='PRINTABLE', value='3', line=1, column=14) DirectoryDocumentToken(kind='NL', value='\n', line=1, column=15) DirectoryDocumentToken(kind='PRINTABLE', value='onion-magic', line=2, column=0) DirectoryDocumentToken(kind='NL', value='\n', line=2, column=11) DirectoryDocumentToken(kind='BEGIN', value='ONION MAGIC', line=3, column=0) DirectoryDocumentToken(kind='PRINTABLE', value='AQQ...DKnR/C', line=4, column=0) DirectoryDocumentToken(kind='NL', value='\n', line=4, column=64) DirectoryDocumentToken(kind='END', value='ONION MAGIC', line=5, column=0) DirectoryDocumentToken(kind='EOF', value=None, line=6, column=0)
Returns: iterator for DirectoryDocumentToken
-
-
class
bushel.directory.document.
DirectoryDocumentItem
(keyword, arguments, objects, errors)[source]¶ A directory document item as described in the Tor directory protocol meta format (§1.2 [dir-spec]).
Parameters: Variables:
-
class
bushel.directory.document.
DirectoryDocumentItemError
[source]¶ Enumeration of forgivable errors that may be encountered during itemization of a directory document.
Name Description TRAILING_WHITESPACE Trailing whitespace on KeywordLines https://bugs.torproject.org/30105
-
class
bushel.directory.document.
DirectoryDocumentItemizer
(allowed_errors=None)[source]¶ Parses
DirectoryDocumentToken
s intoDirectoryDocumentItem
s. By default this is a strict implementation of the Tor directory protocol meta format (§1.2 [dir-spec]), but this can be relaxed to account for implementation bugs in known Tor implementations.Items are produced by processing tokens according to a state machine:
State transitions shown in red would ideally not be needed as they are protocol violations, but implementations of the protocol exist that produce documents requiring these transitions and we need to be bug compatible.
Warning
All printable strings are treated equally right now, so we’re not testing for keywords being the restricted set, nor are we decoding object data yet.
Parameters: allowed_errors (list(DirectoryDocumentItemError)) – A list of errors that will be considered non-fatal during itemization.
-
class
bushel.directory.document.
DirectoryDocumentObject
[source]¶ A directory document item as described in the Tor directory protocol meta format (§1.2 [dir-spec]).
Variables:
-
bushel.directory.document.
decode_object_data
(lines)[source]¶ Decodes the base64 encoded data found within directory document objects.
Parameters: lines (list(str)) – the lines as found in a directory document object, not including newlines or the begin/end lines Returns: the decoded data Return type: bytes
-
bushel.directory.document.
encode_object_data
(data)[source]¶ Encodes bytes using base64 and wraps the lines at 64 charachters.
Parameters: data (bytes) – the data to be encoded Returns: the line-wrapped base64 encoded data as a list of strings, one string per line Return type: list(str)
-
bushel.directory.document.
parse_timestamp
(item, argindex=0)[source]¶ Parses a timestamp from a directory document’s item using the common format from [dir-spec]. This format is not defined explicitly but is used with many keywords including
valid-after
,fresh-until
, andvalid-until
.Note
Due to the way the tokenizer works, timestamps are parsed as two arguments split by whitespace. This function takes this into account when parsing the timestamp.
Most items will have the timestamp as the first argument on the keyword line. At the time of writing, there are no keywords defined that expect timestamps at other indexes. Should this be required though, argindex may be used to parse a timestamp from a later argument.
Parameters: - item (DirectoryDocumentItem) – the directory document item
- argindex (int) – zero-indexed index of date portion of timestamp, the time portion is
expected in
argindex+1
Returns: the parsed timestamp
Return type: