add palette doc

hkchengrex · Feb 18, 2023 · af90122 · af90122
1 parent 139a745
commit af90122
Show file tree

Hide file tree

Showing 2 changed files with 15 additions and 0 deletions.
diff --git a/docs/INFERENCE.md b/docs/INFERENCE.md
@@ -1,5 +1,7 @@
 # Inference
 
+What is palette? Why is the output a "colored image"? How do I make those input masks that look like color images? See [PALETTE.md](./PALETTE.md).
+
 1. Set up the datasets following [GETTING_STARTED.md](./GETTING_STARTED.md).
 2. Download the pretrained models either using `./scripts/download_models.sh`, or manually and put them in `./saves` (create the folder if it doesn't exist). You can download them from [[GitHub]](https://github.com/hkchengrex/XMem/releases/tag/v1.0) or [[Google Drive]](https://drive.google.com/drive/folders/1QYsog7zNzcxGXTGBzEhMUg8QVJwZB6D1?usp=sharing).
 

diff --git a/docs/PALETTE.md b/docs/PALETTE.md
@@ -0,0 +1,13 @@
+# Palette
+
+> Some image formats, such as GIF or PNG, can use a palette, which is a table of (usually) 256 colors to allow for better compression. Basically, instead of representing each pixel with its full color triplet, which takes 24bits (plus eventual 8 more for transparency), they use a 8 bit index that represent the position inside the palette, and thus the color.
+-- https://docs.geoserver.org/2.22.x/en/user/tutorials/palettedimage/palettedimage.html
+
+So those mask files that look like color images are single-channel, `uint8` arrays under the hood. When `PIL` reads them, it (correctly) gives you a two-dimensional array (`opencv` does not work AFAIK). If what you get is instead of three-dimensional, `H*W*3` array, then your mask is not actually a paletted mask, but just a colored image. Reading and saving a paletted mask through `opencv` or MS Paint would destroy the palette.
+
+Our code, when asked to generate multi-object segmentation (e.g., DAVIS 2017/YouTubeVOS), always reads and writes single-channel mask. If there is a palette in the input, we will use it in the output. The code does not care whether a palette is actually used -- we can read grayscale images just fine.
+
+Importantly, we use `np.unique` to determine the number of objects in the mask. This would fail if:
+
+1. Colored images, instead of paletted masks are used.
+2. The masks have "smooth" edges, produced by feathering/downsizing/compression. For example, when you draw the mask in a painting software, make sure you set the brush hardness to maximum.