Skip to content

Commit

Permalink
add palette doc
Browse files Browse the repository at this point in the history
  • Loading branch information
hkchengrex committed Feb 18, 2023
1 parent 139a745 commit af90122
Show file tree
Hide file tree
Showing 2 changed files with 15 additions and 0 deletions.
2 changes: 2 additions & 0 deletions docs/INFERENCE.md
Original file line number Diff line number Diff line change
@@ -1,5 +1,7 @@
# Inference

What is palette? Why is the output a "colored image"? How do I make those input masks that look like color images? See [PALETTE.md](./PALETTE.md).

1. Set up the datasets following [GETTING_STARTED.md](./GETTING_STARTED.md).
2. Download the pretrained models either using `./scripts/download_models.sh`, or manually and put them in `./saves` (create the folder if it doesn't exist). You can download them from [[GitHub]](https://github.com/hkchengrex/XMem/releases/tag/v1.0) or [[Google Drive]](https://drive.google.com/drive/folders/1QYsog7zNzcxGXTGBzEhMUg8QVJwZB6D1?usp=sharing).

Expand Down
13 changes: 13 additions & 0 deletions docs/PALETTE.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,13 @@
# Palette

> Some image formats, such as GIF or PNG, can use a palette, which is a table of (usually) 256 colors to allow for better compression. Basically, instead of representing each pixel with its full color triplet, which takes 24bits (plus eventual 8 more for transparency), they use a 8 bit index that represent the position inside the palette, and thus the color.
-- https://docs.geoserver.org/2.22.x/en/user/tutorials/palettedimage/palettedimage.html

So those mask files that look like color images are single-channel, `uint8` arrays under the hood. When `PIL` reads them, it (correctly) gives you a two-dimensional array (`opencv` does not work AFAIK). If what you get is instead of three-dimensional, `H*W*3` array, then your mask is not actually a paletted mask, but just a colored image. Reading and saving a paletted mask through `opencv` or MS Paint would destroy the palette.

Our code, when asked to generate multi-object segmentation (e.g., DAVIS 2017/YouTubeVOS), always reads and writes single-channel mask. If there is a palette in the input, we will use it in the output. The code does not care whether a palette is actually used -- we can read grayscale images just fine.

Importantly, we use `np.unique` to determine the number of objects in the mask. This would fail if:

1. Colored images, instead of paletted masks are used.
2. The masks have "smooth" edges, produced by feathering/downsizing/compression. For example, when you draw the mask in a painting software, make sure you set the brush hardness to maximum.

0 comments on commit af90122

Please sign in to comment.