Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

toxav.c causing low FPS due to bad memory allocation #1601

Open
vassad opened this issue Jul 14, 2016 · 22 comments
Open

toxav.c causing low FPS due to bad memory allocation #1601

vassad opened this issue Jul 14, 2016 · 22 comments

Comments

@vassad
Copy link

vassad commented Jul 14, 2016

Android tox client (Antox) is suffering from low FPS. Investigating this issue, i came to the native implementation of tox, which could faster up much the FPS.

toxav_video_send_frame method in toxav.c has quite heavy and unnecessary memory allocation for each frame to encode.

vpx_img_alloc(&img, VPX_IMG_FMT_I420, width, height, 0);

  • this one could be allocated only once, when the video capture is started.
memcpy(img.planes[VPX_PLANE_Y], y, width * height);
memcpy(img.planes[VPX_PLANE_U], u, (width / 2) * (height / 2));
memcpy(img.planes[VPX_PLANE_V], v, (width / 2) * (height / 2));
  • these could use the links to y,u,v arrays, instead of copying the data each frame. Or am i wrong, and vpx codec needs the y,u,v data to be side-by-side organized in memory, and we cannot avoid copying the data?

So:

  1. We can create the static field for the img each time the video capture begins, and free it when video capture stops. This could lead to the proplems when several threads will want to access this object, so i want a discussion to decide the best way to solve the issue.
  2. We can pass the link to y,u,v arrays instead of copying the data - or no?
@vassad vassad changed the title toxav.c causing big FPS due to bad memory allocation toxav.c causing low FPS due to bad memory allocation Jul 14, 2016
@GrayHatter
Copy link
Collaborator

@mannol wrote the ToxAV code, so if it's an issue in toxcore, they're aware.

But, IIRC the issue of low FPS on video was something subliun was trying to work through. And the end result of that conversation was that Java is just too inefficient when it comes to converting YUV data on arm devices. So my guess at this point is the issue is more likely Antox's rather than inside toxcore.

@vassad
Copy link
Author

vassad commented Jul 15, 2016

Hey @GrayHatter,
The java side is also written with bad memory allocation, which i managed to solve - I made the java side to reduce to ~20ms average per frame from 1900ms, but the native side is still ~600ms per frame. And it is almost the only problem which lowered the FPS dramatically - array allocations and copying.

I took a look at VideoSendThread thread implementation in Call.scala
https://github.com/Antox/Antox/blob/master/app/src/main/scala/chat/tox/antox/av/Call.scala
I made tests to check the FPS. Tested on LG L65 D285, sending video from Antox to qTox (Windows) with 640x480 resolution. Each frame was sent during 2500ms average, so the FPS was 0.4. The toxcore native side of code ToxSingleton.toxAv.videoSendFrame took about 500-600ms average from these 2500, the rest 1900ms took the scala methodsFormatConversions.nv21toYuv420. Concerning the java/scala side, I managed to save about 1800ms only by reducing the memory allocation for big arrays in FormatConversions.scala (allocating an array of 300k is quite a heavy operation, especially when working with camera preview, and especially on low-performance devices - at least it is always a bad practice to allocate any data all the time, when could be allocated once).

So the heaviest and longest operations here are in FormatConversions.scala:
https://github.com/Antox/Antox/blob/master/app/src/main/scala/chat/tox/antox/av/FormatConversions.scala

  1. val data = new Array[Byte](nv21.data.length)
  • allocating over 400kBytes of dataper frame only once instead of each frame saves ~500ms
  1. Convert.rotateNV21(nv21.data, data, nv21.width, nv21.height, nv21.rotation) ~700ms
  • this one is really java inefficient, but the rotation could be performed on the client, not the sender
val y = data.slice(0, yLength)
val u = new Array[Byte](data.length / 6)
val v = new Array[Byte](data.length / 6)

-data.slice takes ~1200ms, but the use of System.arrayCopy saves almost 1195ms of 1200ms :)

  • allocating uand v only once saves almost the rest ~100-200ms
while (i < data.length / 3) {
if (i % 2 == 0) {
  v(i / 2) = data(yLength + i)
} else {
  u(i / 2) = data(yLength + i)
}
  • here replacing i/2 toi >> 1 also saves a bit (~50ms, which is also essential for big FPS)

////////////////////////////////////////////////////
So, concluding, I managed to optimize a bit the code for the Java side of Antox which increases the FPS dramatically, but the native toxcore side still takes about 500ms per frame for 640x480 to encode and send. Considering the effect of heavy operations during big memory allocations per each frame, it is really necessary to handle it carefully also in the native side, despite the fact that the high-performance devices handle this faster. and despite the nature of C code comparing to Java - working with memory allocations of such a size needs to be as rare, as possible (especially when the size is big - so anytime working with video broadcasting, and at least - allocating memory every time, instead of doing it on video broadcasting start - is useless) ;)

@mannol
Copy link
Contributor

mannol commented Jul 15, 2016

@vassad There is a one big allocation (that I can think of) that can be removed (the one you mentioned), I'll try to do it today.

@vassad
Copy link
Author

vassad commented Jul 15, 2016

@mannol Thanks a lot! Will wait for your progress :) It is great to know that the project is such alive)

@isotoxin
Copy link

Here is my fix of this code:
https://github.com/isotoxin/toxcore/blob/master/toxav/toxav.c#L786
Works fine for a long time

To tox devs: if you guys thing your code ok, tox will die.

@mannol
Copy link
Contributor

mannol commented Jul 15, 2016

Ah yes, that is the fix I'm talking about. @isotoxin Would you mind making a PR instead of me?

@Mikhael-Danilov
Copy link

Don't you mind do it C99 style (with designators) initialization? It would be easier to read and maintain.

@vassad
Copy link
Author

vassad commented Jul 15, 2016

@isotoxin wonderful fix :) but why didn't you PR it for the the last 9 months since the fix? testing the solution perhaps)

@SkyzohKey
Copy link

SkyzohKey commented Jul 15, 2016

why didn't you PR it for the the last 9 months since the fix

Cuz PR on this repo' are never merged ? You'd be better to propose the PR on TokTok/toxcore ;)

@isotoxin
Copy link

Why cant you fix it without PR? Somebody told me my fix is not compatible with old version of libvpx. Im not a linux programmer and I dont know how to resolve this issue. (Ok, I do not even want to think about this problem, because I hate linux way of library usage) You see my code, why you do not want to make toxcore a little better without any PR's? For the sake of the good in the world.

@mannol
Copy link
Contributor

mannol commented Jul 15, 2016

Fine.

@vassad
Copy link
Author

vassad commented Jul 15, 2016

Wow, chill out) Let's just do everything to make it finally be fixed :)

@vassad
Copy link
Author

vassad commented Jul 18, 2016

hey @mannol , the project failed to build due to the old VPX version installed on the Travis machine (Ubuntu 14.04, vpx 1.3.2) - that's why the .cs, .range, .bit_depth, .d_w, .r_w, .r_h, .x_chroma_shift, .y_chroma_shift and .fb_priv are stated as UNDEFINED FIELDS.

So, there are main four ways here:

  1. apply only fields, which are acceptable by 1.3.2 vpx version
  2. setup Travis to update the vpx version (and probably other libs versions) to the trunk
  3. move the allocation of the img to encode, using vpx api allocate method - from toxav.c to video.c (VCSession struct, where vpx encoder and decoder already leave), to the vc_reconfigure_encoder, vc_new, vc_kill methods.
  4. build.tox.chat has all the up-to-date libs. Is it possible to somehow create a job there, using your commit? And how to do it?:)

@mannol
Copy link
Contributor

mannol commented Jul 18, 2016

Do you mind image data copy?

@vassad
Copy link
Author

vassad commented Jul 18, 2016

A pull request will be soon, for the 3) way from @Mikhael-Danilov ...
I meant to use vpx_img_alloc - the way it is implemented right now in toxav.c - but not to allocate it all the time - allocate it (using this api method, which Travis won't blame) when the videocall starts, deallocate when it finishes. It is good practice the same time for the future also, and should work to avoid the BUILD FAILED on Travis and to successfully merge :)

And whant about the build.tox.chat? (Jenkins builder) How we can use it for solving and testing the issues?

@mannol
Copy link
Contributor

mannol commented Jul 18, 2016

Great! I'll close my PR then :^)

@Mikhael-Danilov
Copy link

#1606

@vassad
Copy link
Author

vassad commented Jul 18, 2016

Make was successfull, tests success is no worse than the master toxcore results:

PASS: encryptsave_test
PASS: messenger_autotest
PASS: crypto_test
PASS: network_test
PASS: assoc_test
FAIL: onion_test
FAIL: TCP_test
FAIL: tox_test
FAIL: dht_autotest
PASS: toxav_basic_test

Perhaps it should be merged?)

@Mikhael-Danilov
Copy link

Actually it passes all tests (as master do) on my machine. But for some reason lots of tests fail on Travis...

@im-grey
Copy link

im-grey commented Aug 3, 2016

Thank you all for your work!

@GrayHatter
Copy link
Collaborator

@Mikhael-Danilov that's Travis for you

@Mikhael-Danilov
Copy link

@GrayHatter
Travis-ci

this one for master:
https://travis-ci.org/irungentoo/toxcore/jobs/147799403

this one for my PR:
https://travis-ci.org/irungentoo/toxcore/jobs/148003401

Or... This travis-ci builds&tests are deprecated and still mentioned in README.md by accident?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

7 participants