Lots of ML theory behind deep learning, but the truth is that what made things work was clever engineering & programming abstractions (Caffe, TensorFlow, PyTorch, ...). These design choices now dictate our workflow, and the next major AI breakthrough will require similar tools.
- Thomas Lux: The best theory is inspired by great experiments and vice versa. Having one without the other is probably reckless (likely to waste time & effort). I feel like we’re leaning a little too hard on engineering right now because no one is making practical enough theory.
- Matthäus G. Chajdas: The more hardware is purpose-built for those frameworks though the harder it will be to change, given HW investments need to get recouped. Right now it's all about sparse matrix multiplies and if a new way of doing things shows up it better maps to that HW.
reference: https://twitter.com/MattNiessner/status/1473765870757453832?s=20
(1/2) Just hash occupied voxels, 𝐻(𝑥,𝑦,𝑧)=(𝑥⋅𝑝_1⨁ 𝑦⋅𝑝_2⨁ 𝑧⋅𝑝_3) mod 𝑛, and store them in a linear array!
The voxel hashing method proposed this for 3D recon in 2013, but it's used in many apps such as sparse conv nets!
(2/2) To access the value of a voxel, simply evaluate the hash, e.g., for a filter kernel or to update a TSDF value.
Why are hashes so powerful?
Data access and insertion is O(1) - ignoring collisions. In comparison, for tree structures, such as octrees, that's O(log(n)).
[Andreas Kirsch] Wasn't that a paper by @sylefeb, too?https://t.co/SUwJlGNxcY
[Sylvain Lefebvre] Thanks for the mention! We have some other works on the topic too (e.g. showing that coalesced accesses are possible through a hash on the GPU: https://hal.inria.fr/inria-00624777/document )
[game programmer dude] What do you with dynamic objects moving through the grids? Do you need to remove them from the voxel they were in and then add them to the voxel they move into? Seems like removal of an object from one voxel takes o(n).
[Morten Vassvik] I can't find a list of the bounding box spans of all the test cases in the paper. In most cases at least one dimension is missing, e.g. Figure 1 doesn't mention the depth, so I'm not able to compute the efficient sparsity and span of each case. Are these available somewhere? :)
Happening tomorrow! #NeurIPS 2021 workshop on physical reasoning and inductive biases for the real world!
Join us at 11 am EST (8 am PST, 4 pm GMT/UTC)
For schedule and accepted papers, see https://physical-reasoning.github.io
Fantastic speakers and panelists from industry/academia
reference: https://twitter.com/_krishna_murthy/status/1470524128922849282?s=20
Our video lectures from the course Multiple View Geometry for Computer Vision Applications at Weizmann Institute of Science are now available online: https://youtube.com/playlist?list=PLEB45naDUsF2vpvdxZ72Jjl8ZEKISv4Nh
reference: https://twitter.com/yoni_kasten/status/1472970833622712329?s=20
-
abs: https://arxiv.org/abs/2112.06909
almost think the non compatible combinations are more promising for art
reference: https://twitter.com/ak92501/status/1470589245874180096?s=20
-
We collaborated with Juan Murphy at ONE MORE POLY. His vision: bridging physical + digital content to give creative agencies and brands a new opportunity to create immersive AR and VR content.
reference: https://twitter.com/EveryPointIO/status/1470541263040487424?s=20
-
Short news story about our work on repulsive shape optimization:https://scs.cmu.edu/news/2021/repulsive-energies
This is joint work with recent @SCSatCMU rad Chris Yu, Henrik Schumacher from RWTH Aachen University/TU Chemnitz, and high school student Caleb Brakensiek.
homepage:https://www.scs.cmu.edu/news/2021/repulsive-energies
-
This Tuesday, we'll discuss the paper "Equivariant Subgraph Aggregation Networks" with the authors.
Join us on Zoom at 4pm GMT: https://hannes-stark.com/logag-reading-group
They propose the interesting idea to leverage that we can get more expressive GNNs if we consider perturbations of the original graph in addition to the original graph.
-
"Self-Calibrating Neural Radiance Fields" https://arxiv.org/abs/2108.13826
Noteworthy :)
-
Today, our brilliant student Lingxiao Li presented "Interactive All-Hex Meshing via Cuboid Decomposition" at @SIGGRAPHAsia. It provides a UI+clever geometric algorithms for interactive polycube-based hex meshing. This is the first time I've been able to hex mesh reliably!
This was a collaboration between students in our group: Lingxiao+Paul Zhang+@_DimaSmirnov+@MazAbulnaga. I'm so proud they worked together this well; you can see each of their insight in this work.
Download &try (it's fun!): https://github.com/lingxiaoli94/interactive-hex-meshing
Paper: https://arxiv.org/pdf/2109.06279.pdf
reference: https://twitter.com/JustinMSolomon/status/1471333735270342662?s=20
-
My 1st graphics paper, way back in 2006, was on GPU ray tracing. The idea was to enable fast geometry updates using a "min-max MIP map": http://cs.cmu.edu/~kmcrane/Projects/RayTracingGeometryImages/paper.pdf
@boubek & co revisit this idea for displacement maps, rather than the full geometry. Nice idea—check it out below!
To recap the basic approach: you start by storing your geometry in the very cool "geometry image" format of Gu et al, i.e., an image where the R,G,B components of a pixel describe X,Y,Z locations in space: https://hhoppe.com/proj/gim/. A quad of pixels then describes two triangles.
An ordinary MIP maps stores an "image pyramid" where each image is 1/2 the size of the previous one, downsampled by averaging. This "pre-filtering" lets you efficiently approximate integrals of many texels covered by one pixel. Basically all textures today are drawn this way.
As a total aside, the original MIP mapping paper (by Lance Williams in 1983) used this ridiculous frog image in most of the figures. I'm kind of sad this never become a standard test image in computer graphics. (Almost looks like something @_AlecJacobson might draw…)
A min-max MIP map does something a bit different: rather than averaging pixels, it takes the minimum (or maximum) in each component, generating a min (or max) image pyramid. Since the RGB values encode XYZ positions, the min/max values give the corners of a bounding box!
So, a min/max pair of image pyramids provides a bounding volume hierarchy (BVH) for the geometry. What's cool about this BVH is that it's super fast to compute/update—especially for a GPU in 2006! Just do repeated downsampling, but with a different arithmetic operation.
You can take this idea one step further and precompute, for each texel, where you should go next if your ray hits or misses that bounding box. This way, BVH traversal is "stack free," and can be nicely streamed through the GPU.
As another total aside, I'm reminded @BrianKaris (developer on the UE5 Nanite engine) mused about geometry images in a quest for geometry virtualization: http://graphicrants.blogspot.com/2009/01/virtual-geometry-images.html. In the end Nanite doesn't use 'em—but hey, it at least uses a min MIP map for occlusion culling!
Anyway, the point of all this is that by reducing BVH construction to min/max downsampling, it becomes almost a no-op to update the BVH for a deforming scene. (Check out our weirdo video above, with talking ogres and hopping teapots.)
The @AdobeResearch paper takes this whole idea yet another step further, using a min-max MIP map just for displacements on top of a coarse mesh—they also explain how to do traversal when geometry deforms. But I'll let them do the rest of the talking!
reference: https://twitter.com/keenanisalive/status/1471802366596890624?s=20
-
We tackle the 𝗴𝗲𝗻𝗲𝗿𝗮𝘁𝗶𝘃𝗲 𝗹𝗲𝗮𝗿𝗻𝗶𝗻𝗴 𝘁𝗿𝗶𝗹𝗲𝗺𝗺𝗮 with the novel 𝗱𝗲𝗻𝗼𝗶𝘀𝗶𝗻𝗴 𝗱𝗶𝗳𝗳𝘂𝘀𝗶𝗼𝗻 𝗚𝗔𝗡, a diffusion model designed for fast sampling. https://nvlabs.github.io/denoising-diffusion-gan Zhisheng Xiao @karsten_kreis ft. Peanut.
What's Generative Learning Trilemma?
The existing generative models often struggle with simultaneously addressing 3 key requirements shown below. We call the challenge imposed by these requirements the generative learning trilemma, as current models often trade between them. In this paper, we tackle the trilemma by reformulating denoising diffusion models specifically for 𝗳𝗮𝘀𝘁 𝘀𝗮𝗺𝗽𝗹𝗶𝗻𝗴 while maintaining strong mode coverage and sample quality. We argue that slow sampling in diffusion models is attributed to the Gaussian assumption in the denoising step which is justified only for small step sizes. When the reverse process uses larger step sizes (has fewer # steps), we need a non-Gaussian multimodal denoising dist. Inspired by the observation, we propose a novel generative model, called denoising diffusion GAN, in which the denoising distributions are modeled with multimodal conditional GANs. In image generation, we observe that our model achieves sample quality and mode coverage competitive with diffusion models while requiring only as few as 2 denoising steps, achieving up to 2,000× speed-up in sampling compared to predictor-corrector sampling on CIFAR-10.
Compared to state-of-the-art GANs, we observe strong mode coverage and stable training, while being competitive in sample fidelity on CIFAR-10. Additional results on CelebA-HQ 256 (left) and LSUN Church Outdoor 256 (right).
Project Page: https://nvlabs.github.io/denoising-diffusion-gan/ arXiv: https://arxiv.org/abs/2112.07804 Code will be released at: https://github.com/NVlabs/denoising-diffusion-gan
- Karsten Kreis: AchievingKeycap digit onehigh quality andKeycap digit twofast sampling, whileKeycap digit threefaithfully covering the data's modes, has been a challenge in generative learning. Our Denoising Diffusion GAN addresses this. We hope this leads to more interactive applications of diffusion models! Very exciting project!
reference: https://twitter.com/ArashVahdat/status/1471513155054362625?s=20
-
Nice writeup by @CSDatCMU undergrad Hesper Yin (@hyin2147483647) on how to implement fluid simulation on triangle meshes: https://yhesper.github.io/projects/2_project_simpfluid/
In particular she grapples with how to best perform semi-Lagrangian advection (which was never really spelled out by Elcott et al).
video: https://twitter.com/keenanisalive/status/1471147106190733312
-
Efficient Geometry-aware 3D Generative Adversarial Networks
abs: https://arxiv.org/abs/2112.07945
project page: https://matthew-a-chan.github.io/EG3D/
demonstrate sota 3D-aware synthesis with FFHQ and AFHQ Cats, among other experiments
Video: https://youtu.be/cXxEwI7QbKg
-
Putting People in their Place: Monocular Regression of 3D People in Depth
abs: https://arxiv.org/abs/2112.08274
BEV outperforms existing methods on depth reasoning, child shape estimation, and robustness to occlusion
-
With a displacement-centric acceleration structure built upon affine arithmetic, our new #SIGGRAPHAsia2021 paper, presented today by Théo Thonat, explains how to perform high-speed ray tracing, without tessellation. More info: https://research.adobe.com/publication/tessellation-free-displacement-mapping-for-ray-tracing/ @AdobeResearch @Substance3D
-
Surprised & happy to hear that our paper "RAISR: Rapid and Accurate Image Super Resolution" was selected for the 2021 IEEE Signal Processing Society Best Paper Award. Very proud of my co-authors, especially the very talented @yaniv_romano
reference :https://ieeexplore.ieee.org/abstract/document/7744595
-
New Video! @ffabffrasca @dereklim_lzh and @beabevi_ explain their "Equivariant Subgraph Aggregation Networks" paper! One of the most fun sessions we had so far!
Link: https://youtu.be/voMue3G-_vk
reference: https://twitter.com/HannesStaerk/status/1472596171298332683?s=20