-
Notifications
You must be signed in to change notification settings - Fork 21
Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
* Announce that we support file transfers * Pass existing BytesIO instead of creating it anew everywhere * Respond to file transfer requests * It can upload some data * Don't look for acknowledgement when server cancelled the upload * Allow to check for cancellation * Do not include \n in the file name * Allow uploading through a file-like object * Return early if cancelled * Mock _getblock_raw instead of _getblock The latter is no longer called by mapi.Connection.cmd(). * Fix type error found by mypy * Add some asserts to silence mypy * Make pycodestyle happy * Make abstract base class for uploaders * Add cancel method to Uploader * Support downloads * Refactor: chunk_left -> chunk_used * Add set_chunk_size * Complain if the handler didn't do anything * Fix missing 'return' keyword * Avoid empty writes * When cancelled, pretend to write all bytes Before we returned 0 but that leads to endless loops when called by the utf-8 codec. * Start porting the file transfer tests from monetdb-jdbc They are very useful. * whitespace * Silly copy pasta * More tests * Use TextIOWrapper() directly instead of calling codecs.getreader(). This allows us to force the line endings to "\n" * Rename {Up,Down}loader.handle to handle_{up,down}load So a class can implement Uploader and Downloader simultaneously * Disconnect if the upload handler throws an exception We must make sure that the server doesn't think the upload ended succesfully and unfortunately the only way to do that is by killing the connection. * Disconnect if the download handler throws an exception * Normalize \r\n to \n in uploaded text * Export Uploader and Downloader from the pymonetdb toplevel * Add doc strings * Reset default chunk size to 1 MiB * Minor fixes * Small fixes for pycodestyle and mypy * Move file transfer code to separate module * cleanup * Add DefaultHandler * Test uploads with DefaultHandler * Thank you, mypy * Path.is_relative_to was only introduced in Python 3.9 * Fix bad mistake in test * Test downloads with DefaultHandler * Delete leftover line of code * Stop on end of file while skipping * Roll back between subtests * Test various skip amounts * Do not forget to acknowledge when uploading empty file * Add timeout mechanism to catch hanging tests * Test empty downloads * Fix bug in empty downloads * Test the CR LF normalizer and fix some bugs * Test DefaultHandler security * Split generic tests and default handler tests * Expand generated subtests into standalone * Uncompress files automatically * Also test default download handler * Demonstrate that _getblock_socket is dead code * Remove the dead code * Simplify buffer management * Update the documentation * Document compression support and allow to disable it * Combine the standalone subtests into a generator again Having them separate was useful while there were many bugs, now that it mostly works conciseness is more important. * Test it on Windows Line endings are different there.. * Fix the paths * Drop close_fds * add future dependency hoping it can then import 'past' * do not capture stderr * Record server stderr * We'd really like to be able to see the server stderr * fixes * syntax * Show the default encoding * Encodings * Be more careful with the default encoding * When testing text uploads in binary mode, make sure it's utf-8 * Proofreading changes * Remove duplicate code * Rename DefaultHandler to SafeDirectoryHandler * Improve the documentation * Some gra corrections. * Use with-block in example * Clarify :meta private: in docstring * Avoid shadowing 'mapi' import with 'mapi' parameter * Clean unused imports * Remove circular dependencies between .mapi and .filetransfer * Add more type annotations * Removed nested try block * Start splitting filetransfer.py in separate files * Split up the filetransfer module * Adjust api.rst to new module structure * Add documentation for handle_download Co-authored-by: lrpereira <[email protected]>
- Loading branch information
1 parent
a2f7166
commit ea38611
Showing
22 changed files
with
2,264 additions
and
57 deletions.
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,50 @@ | ||
name: Windows Test | ||
on: | ||
push: | ||
branches: [ master ] | ||
pull_request: | ||
branches: [ master ] | ||
jobs: | ||
runtests: | ||
runs-on: windows-2019 | ||
steps: | ||
- uses: actions/checkout@v2 | ||
|
||
- uses: actions/setup-python@v1 | ||
with: | ||
python-version: '3.6' | ||
|
||
# Unfortunately msiexec does seem to work on the windows runner. | ||
# 7zip is able to unpack .msi files but it loses the directory structure. | ||
# We fix that in the next step. | ||
- name: Download MonetDB | ||
run: | | ||
curl https://www.monetdb.org/downloads/Windows/Jan2022-SP1/MonetDB5-SQL-Installer-x86_64-20220207.msi -o ${{ runner.temp }}\monetdb.msi --no-progress-meter | ||
dir ${{ runner.temp }} | ||
7z x ${{ runner.temp }}\monetdb.msi -o${{ runner.temp }}\staging | ||
dir ${{ runner.temp }}\staging | ||
# Run a script to restore the directory structure and see if it works (a little) | ||
- name: Install MonetDB | ||
run: | | ||
python tests/install_monetdb_from_msi_dir.py ${{ runner.temp }}\staging ${{ runner.temp }}\MONET | ||
dir ${{ runner.temp }}\MONET | ||
dir ${{ runner.temp }}\MONET\bin | ||
${{ runner.temp }}\MONET\bin\mserver5.exe --help | ||
- name: Setup virtual environment | ||
run: | | ||
python -m venv venv | ||
venv\Scripts\Activate.ps1 | ||
python -m pip install -r tests/requirements.txt | ||
# Script tests/windows_tests.py starts an mserver in the background | ||
# and runs pytest, excluding the Control tests. | ||
- name: run the tests | ||
run: | | ||
venv\Scripts\Activate.ps1 | ||
mkdir ${{ runner.temp }}\dbfarm | ||
python tests/windows_tests.py ${{ runner.temp }}\MONET ${{ runner.temp }}\dbfarm demo 50000 | ||
echo ""; echo ""; echo "================ SERVER STDERR: ==================="; echo "" | ||
type ${{ runner.temp }}\dbfarm\errlog | ||
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -1,4 +1,4 @@ | ||
development | ||
Development | ||
=========== | ||
|
||
Github | ||
|
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,36 @@ | ||
#!/usr/bin/env python3 | ||
|
||
import os | ||
import pymonetdb | ||
|
||
# Create the data directory and the CSV file | ||
try: | ||
os.mkdir("datadir") | ||
except FileExistsError: | ||
pass | ||
with open("datadir/data.csv", "w") as f: | ||
for i in range(10): | ||
print(f"{i},item{i + 1}", file=f) | ||
|
||
# Connect to MonetDB and register the upload handler | ||
conn = pymonetdb.connect('demo') | ||
handler = pymonetdb.SafeDirectoryHandler("datadir") | ||
conn.set_uploader(handler) | ||
cursor = conn.cursor() | ||
|
||
# Set up the table | ||
cursor.execute("DROP TABLE foo") | ||
cursor.execute("CREATE TABLE foo(i INT, t TEXT)") | ||
|
||
# Upload the data, this will ask the handler to upload data.csv | ||
cursor.execute("COPY INTO foo FROM 'data.csv' ON CLIENT USING DELIMITERS ','") | ||
|
||
# Check that it has loaded | ||
cursor.execute("SELECT t FROM foo WHERE i = 9") | ||
row = cursor.fetchone() | ||
assert row[0] == 'item10' | ||
|
||
# Goodbye | ||
conn.commit() | ||
cursor.close() | ||
conn.close() |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,27 @@ | ||
#!/usr/bin/env python3 | ||
import pymonetdb | ||
|
||
class MyUploader(pymonetdb.Uploader): | ||
def handle_upload(self, upload, filename, text_mode, skip_amount): | ||
tw = upload.text_writer() | ||
for i in range(skip_amount, 1000): | ||
print(f'{i},number{i}', file=tw) | ||
|
||
conn = pymonetdb.connect('demo') | ||
conn.set_uploader(MyUploader()) | ||
|
||
cursor = conn.cursor() | ||
cursor.execute("DROP TABLE foo") | ||
cursor.execute("CREATE TABLE foo(i INT, t TEXT)") | ||
cursor.execute("COPY 10 RECORDS OFFSET 7 INTO foo FROM 'data.csv' ON CLIENT USING DELIMITERS ','") | ||
cursor.execute("SELECT COUNT(i), MIN(i), MAX(i) FROM foo") | ||
row = cursor.fetchone() | ||
print(row) | ||
assert row[0] == 10 # ten records numbered | ||
assert row[1] == 6 # offset 7 means skip first 6, that is, records 0, .., 5 | ||
assert row[2] == 15 # 10 records: 6, 7,8, 9,10,11, 12,13,14, and 15 | ||
|
||
# Goodbye | ||
conn.commit() | ||
cursor.close() | ||
conn.close() |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,38 @@ | ||
#!/usr/bin/env python3 | ||
import pathlib | ||
import shutil | ||
import pymonetdb | ||
|
||
class MyUploader(pymonetdb.Uploader): | ||
def __init__(self, dir): | ||
self.dir = pathlib.Path(dir) | ||
|
||
def handle_upload(self, upload, filename, text_mode, skip_amount): | ||
# security check | ||
path = self.dir.joinpath(filename).resolve() | ||
if not str(path).startswith(str(self.dir.resolve())): | ||
return upload.send_error('Forbidden') | ||
# open | ||
tw = upload.text_writer() | ||
with open(path) as f: | ||
# skip | ||
for i in range(skip_amount): | ||
f.readline() | ||
# bulk upload | ||
shutil.copyfileobj(f, tw) | ||
|
||
conn = pymonetdb.connect('demo') | ||
conn.set_uploader(MyUploader('datadir')) | ||
|
||
cursor = conn.cursor() | ||
cursor.execute("DROP TABLE foo") | ||
cursor.execute("CREATE TABLE foo(i INT, t TEXT)") | ||
cursor.execute("COPY 10 RECORDS OFFSET 7 INTO foo FROM 'data.csv' ON CLIENT USING DELIMITERS ','") | ||
cursor.execute("SELECT COUNT(i), MIN(i), MAX(i) FROM foo") | ||
row = cursor.fetchone() | ||
print(row) | ||
|
||
# Goodbye | ||
conn.commit() | ||
cursor.close() | ||
conn.close() |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,120 @@ | ||
File Transfers | ||
============== | ||
|
||
MonetDB supports the non-standard :code:`COPY INTO` statement to load a CSV-like | ||
text file into a table or to dump a table to a text file. This statement has an | ||
optional modifier :code:`ON CLIENT` to indicate that the server should not | ||
try to open the file server-side, but should instead ask the client to open the | ||
file on its behalf. | ||
|
||
For example:: | ||
|
||
COPY INTO mytable FROM 'data'.csv' ON CLIENT | ||
USING DELIMITERS ',', E'\n', '"'; | ||
|
||
By default, if pymonetdb receives a file request from the server, it will refuse | ||
it for security considerations. You do not want the server or a hacker pretending | ||
to be the server to be able to request arbitrary files on your system and even | ||
overwrite them. | ||
|
||
To enable file transfers, create a `pymonetdb.Uploader` and/or | ||
`pymonetdb.Downloader` and register them with your connection:: | ||
|
||
transfer_handler = pymonetdb.SafeDirectoryHandler(datadir) | ||
conn.set_uploader(transfer_handler) | ||
conn.set_downloader(transfer_handler) | ||
|
||
With this in place, the COPY INTO ON CLIENT statement above will ask to open | ||
file data.csv in the given `datadir` and upload its contents. As its name | ||
suggests, :class:`SafeDirectoryHandler` will only allow access to the files in | ||
that directory. | ||
|
||
Note that in this example we register the same handler object both as an | ||
uploader and a downloader, but it is perfectly sensible to only register an | ||
uploader, or only a downloader, or to use two separate handlers. | ||
|
||
See the API documentation for details. | ||
|
||
|
||
Make up data as you go | ||
---------------------- | ||
|
||
You can also write your own transfer handlers. And instead of opening a file, | ||
such handlers can also make up the data on the fly, retrieve it from a remote | ||
microservice, prompt the user interactively or do whatever else you come up | ||
with: | ||
|
||
.. literalinclude:: examples/uploaddyn.py | ||
:pyobject: MyUploader | ||
|
||
In this example we called `upload.text_writer()` which yields a text-mode | ||
file-like object. There is also `upload.binary_writer()` which yields a | ||
binary-mode file-like object. This works even if the server requested a text | ||
mode object, but in that case you have to make sure the bytes you write are valid | ||
utf-8 and delimited with Unix line endings rather than Windows line endings. | ||
|
||
If you want to refuse an up- or download, call `upload.send_error()` to send an | ||
error message. This is only possible before any calls to `text_writer()` and | ||
`binary_writer()`. | ||
|
||
For custom downloaders the situation is similar, except that instead of | ||
`text_writer` and `binary_writer`, the `download` parameter offers | ||
`download.text_reader()` and `download.text_writer()`. | ||
|
||
|
||
Skip amount | ||
----------- | ||
|
||
MonetDB's :code:`COPY INTO` statement allows you to skip for example the first | ||
line in a file using the the modifier :code:`OFFSET 2`. In such a case, | ||
the `skip_amount` parameter to `handle_upload` will be greater than zero. | ||
|
||
Note that the offset in the SQL statement is 1-based, whereas the `skip_amount` | ||
parameter has already been converted to be 0-based. In the example above | ||
this allowed us to write :code:`for i in range(skip_amount, 1000):` rather | ||
than :code:`for i in range(1000):`. | ||
|
||
|
||
Cancellation | ||
------------ | ||
|
||
If the server does not need all uploaded data, for example if you did:: | ||
|
||
COPY 100 RECORDS INTO mytable FROM 'data.csv' ON CLIENT | ||
|
||
the server may at some point cancel the upload. This does not happen instantly, | ||
from time to time pymonetdb explicitly asks the server if they are still | ||
interested. By default this is after every MiB of data but that can be | ||
configured using `upload.set_chunk_size()`. If the server answers that it is no | ||
longer interested, pymonetdb will discard any further data written to the | ||
writer. It is recommended to occasionally call `upload.is_cancelled()` to check | ||
for this and exit early if the upload has been cancelled. | ||
|
||
Upload handlers also have an optional method `cancel()` that you can override. | ||
This method is called when pymonetdb receives the cancellation request. | ||
|
||
|
||
Copying data from or to a file-like object | ||
------------------------------------------ | ||
|
||
If you are moving large amounts of data between pymonetdb and a file-like object | ||
such as a file, Pythons `copyfileobj`_ function may come in handy: | ||
|
||
.. literalinclude:: examples/uploadsafe.py | ||
:pyobject: MyUploader | ||
|
||
However, note that copyfileobj does not handle cancellations as described above. | ||
|
||
.. _copyfileobj: https://docs.python.org/3/library/shutil.html#shutil.copyfileobj | ||
|
||
|
||
Security considerations | ||
----------------------- | ||
|
||
If your handler accesses the file system or the network, it is absolutely critical | ||
to carefully validate the file name you are given. Otherwise an attacker can take | ||
over the server or the connection to the server and cause great damage. | ||
|
||
An example of how to validate file systems paths is given in the code sample above. | ||
Similar considerations apply to text that is inserted into network urls and other | ||
resource identifiers. |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
|
@@ -12,6 +12,7 @@ Contents: | |
:maxdepth: 2 | ||
|
||
introduction | ||
filetransfers | ||
examples | ||
api | ||
development | ||
|
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Oops, something went wrong.