Not a member of Pastebin yet?
Sign Up,
it unlocks many cool features!
- And speaking of image_embedding. This looks like the data that is compared (by computing dot product?) against those Base64 encoded blobs in snapshot triggers. In triggers, after decoding from Base64 they form a raw array of single precision floating point numbers. 1024 values for repeaters, 2048 for main camera. If they make some sample photos and compute average “image_embedding” vector for specified version of NN in trigger, it would be like showing an image to the cars and saying “find me more like this one”, not super effective but maybe sometimes working… makes sense?
- Take for example that last trigger "closed garage". Lets assume that the network was not prepared for detecting garage but they want to find images of garages. So they are making couple photos of closed garage doors, calculate how image_embedding looks in current network for this photos, and command cars (with triggers) to find samples where image_embedding looks similar. This would (maybe) allow them to find more examples of scenes that they don't have yet. I don't know how well it would work in practice (I'm sure it would generate lots of false detections and miss many scenes, don't know how many). So this is just a wild theory.
- Gathering images based on similarity to a particular image embedding will capture stuff that is similar in an abstracted sense. You can adjust how much stuff gets captured by adjusting the filter selectivity- usually you would grab stuff within a certain distance of the target. By increasing the acceptable distance you capture things that are more different.
- If you started with 100 pictures of garage interiors and extracted the common embedding for them then asked for images with similar embeddings you would get other images that were similar to the interior of the garage. If you make the maximum distance small then you will get garages that are very similar to the garage is in your original sample but if you make the cutoff distance larger then you can get things which are more and more different included in the capture.
- By adjusting the distance you can adjust the amount of over capture that you are willing to tolerate. All the stuff that gets captured will have to be reviewed by a human
- This is just a way to capture a collection of photos which is relatively rich in garage interiors. If you make the cutoff relatively large then you will have more stuff that you have to throw away but you also capture more samples of garages which are more different from the sample you started with.
- So probably the thing to do is to adjust the cutoff value to capture stuff that you haven’t seen before but avoid having to sift through too much garbage to find the stuff you want.
- You might be able to find evidence of them adjusting the cutoff if you can find successive uses of the same trigger label with different blobs attached.
- In order to train a neural network to reliably recognize garages you need a large collection of photos of different garages that actually exist in the real world and you want that sample to include unusual garages to make sure that your neural network will work in marginal cases. But to start out you might only have pictures of 1000 or 10,000 different garages when what you need is maybe 1 million photos or something like that and you want that million photos to include unusual garages as well as common garages. So you take those thousand images and you extract the embedding’s and you look at what is comment about them. Then you take that commonality and send that out to all the cars in the field and tell them to look for images which have an a betting that is similar to that commonality. The cars send you back a bunch of photos and you have a bunch of people go through those photos to take out the ones that are actually garages.
- Then you add those new human curated photos to your collection of garage photos and repeat the process until you have 1 million images of garages.
- How big it is depends on how good a filter the embeddings are. If the filter is generating 99.9% garbage then it becomes a big effort. But if the filter is generating 99% garage photos then the effort isn’t too bad. Probably they start out with a relatively poor filter and it gets better as they increase the population of photos that they have to train with. Towards the end you would expect it to be 99.9% good. If the filter never gets very good then you can’t use it anyway.
- I guess you could think of this as a kind of shadow mode. It lets you test out your neural networks ability to recognize a new category of stuff in the real world before you deploy it to actually use in the cars.
- And as a byproduct it generates a lot of photos that you can use for training.
RAW Paste Data