-
Notifications
You must be signed in to change notification settings - Fork 6
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Only one executing per subprocess being executed #2
Comments
I managed to get this working. For some reason when using less than 12 subprocess it simply get stuck until the lambda execution times out. I forced with 24 and it worked but it consumed way too much ram. Not sure if this happens because of my workload or if it's how the child processes are beeing distributed. Something that i noticed is that running locally my CPU usage graph looks like a square function graph. Not sure why this behaviour I will take a look this weekend. |
What does The queuing is currently a bit dumb. It assigns all tasks up front, assuming that each task takes approximately the same amount of time. Does that assumption hold for your use case? |
I'm working with a lot of buffers to avoid saving the PDFs pages to disk (but this may be the solution for this scenario). The CreatePageWithWatermark() function mostly read a page from the buffer provided, generate a page and merge both in a new buffer. It took a while, but I managed to recreate the issue with the code below: from lambda_multiprocessing import Pool as LambdaPool
import timeit
import time
import requests
def CreatePageWithWatermark(title, author, page, buffer):
time.sleep(5)
print(len(buffer))
return buffer
def PDFMergeWorker(event, context):
start_time = timeit.default_timer()
url='https://pdfs.semanticscholar.org/c029/baf196f33050ceea9ecbf90f054fd5654277.pdf'
r = requests.get(url, stream=True)
pages_total = 100
pages_data = [(
'test',
'test',
current_page,
r.content,
) for current_page in range(0,100)]
with LambdaPool() as executor:
rendered_pages = executor.starmap(CreatePageWithWatermark, pages_data)
# Speedometer
stop_time = timeit.default_timer()
delta_time = stop_time - start_time
print('Time: ', delta_time)
print('Time per Page: ', delta_time / pages_total)
return 'Done' The behavior is the same. on my machine there is only 16 print() calls (i have 16 threads) and on aws lambda only 6 print() call are made. (i setted 10G for this lambda) If you remove the sleep() call the code runs fine. Only with both the code crashes. |
Hmm, interesting. I have seen an issue previously where print statements within a for loop, in one process, containing a sleep get buffered. So if you're printing thousands of lines, the stdout buffer will fill and get flushed. But if it's low volume stuff stays in the buffer longer than you'd expect. I would expect that the process termination would result in the stdout buffer being flushed. But I don't know what special stuff Lambda does with stdout. If you swap print for sys.stdout.write, do you get the same behaviour? Also, in your MWE you're doing only one requests.get() call, and then attempting to operate on the .content response 100 times concurrently. Are you sure that whatever .requests returns is itself concurrency safe? I don't know what len() does to a buffer object. Does it consume it? (I will play around to debug this weekend.) |
Have you looked at this again? I'm not able to reproduce the bug because the example is not a minimal working example. Most importantly:
But also:
Here's a MWE (except it doesn't reproduce the issue). (It could be reduced even further, depending on what the issue is.)
|
Hi there,
I'm trying to use this package but for some weird reason each subprocess is only executing once and hanging the rest of the code.
I'm using python 3.9, here is a snippet of the code:
Each
generators.CreatePageWithWatermark
execution prints the current page and only 1 page is being processed by each subprocess. If i have less pages than the total of cores the code run just fine.Just for fun I've set 1000 subprocess and my pc freezed but all pages were rendered.
The same behavior happens on the aws lambda.
Thks! that's a great little package.
edit: so now (out of the blue) its working on my pc but not on awslambda, only 6 executions are happening. Very weird.
The text was updated successfully, but these errors were encountered: