Untitled

Smart Camera Gimbal Bot
-----------------------

This is the latest update in the Tuco Flyer project, a DIY indoor wirecam I'm building to make it easier for all of you to hang out with my furry co-star Tuco.

This time we'll build a movable camera platform with sensors and computer vision, so it can detect and track cat-like objects.

[ Scanlime bumper ]

I want this bot to work well on its own, or with a crowd of people controlling it remotely, so one goal at this stage of the design is to have a good mixture of automatic and manual control options. A tracking algorithm stabilizes the camera around any rectangle of pixels, and meanwhile an object detector labels the scene with rectangles we might find interesting.

We'll start by building this tall gimbal platform I call the "Flyer". It has a camera, sensors, and lights- but the video streaming and computer vision happens on the ground.

Previous episodes introduced the Tuco Flyer project, the modifications I made to get live high-quality video out of the camera gimbal, and the reverse engineering it took to control the gimbal.

In the last episode we prototyped the robotic winches that will be pulling our flyer bot around the shop.

The Flyer hangs by four nylon ropes, so it has a rectangular space it can move in. A wire leash drops down to the Flyer as well, for power, video, and network.

These wires will need to stay above the height of people and furniture, but I'd like the camera itself to have some freedom to move through space that might be obstructed.

So the Flyer has a half meter carbon fiber boom that hangs the camera gimbal below a "sensor saucer" which includes collision avoidance sensors designed to sense infrared reflection in the area around the camera.

3D printed clamps attach the pair of carbon composite tubes to a camera mount on bottom and a rope tie-off ring on top.

All of the bots in this project use the same TI microcontroller board. This acts like a real-time bridge between analog and digital sensors and the 100baseT Ethernet network. Sensor readings end up in a stream of outgoing UDP packets, and it converts another type of UDP packet to and from the Gimbal's flavor of serial packet.

The sensor saucer includes 3 long-range LIDAR sensors which will help determine the flyer's height above the floor, and eight Sharp IR distance triangulation sensors, to detect obstacles around t he camera.

Centrally located, the Sensor Shrimp here includes a Parallax X-Band motion detector and an Adafruit breakout board for the BNO055 orientation sensor. The X-Band sensor can detect humans or animals sneaking up on the bot, which I thought would be fun. The orientation sensor will be useful feedback for the motion control later on.

The Flyer has some LEDs for communicating with humans in the area; a stacked set of 3 rings above the gimbal can give some indication of the robot's mode and aim to the people nearby.

Each ring is a strip of 36 LEDs. I used the APA102-style addressable pixel. They sandwich between opaque PLA carriers, behind a diffuser made from translucent NinjaFlex filament.

The top of the saucer section is made from an opaque PLA disc which holds four small 7-pixel LED strips at right angles. These are left un-diffused, since I expect to use them for visibility from an overhead camera, and sharp points of light may be useful for locating the bot with software later on.

I left in a mistake here: there was a dead LED on one of the strips, and it would have been easier to replace the whole strip but I didn't have a long enough spare, so I spliced in a small section of working LED strip. This was after a failed try at replacing the individual LED with a hot air station, as they're very heat-sensitive.

The LEDs look nice running a test pattern here, but there's still some work to be done in figuring out how to communicate status through these LEDs, and then writing the animation code behind that.

The SDI video leaves the modified gimbal through an SMA cable, which this bracket securely holds.

Those LEDs, the sensors, and the microcontroller all need 5 volts, but the gimbal motors will run on 12 volts. The leash supplies this higher voltage, and the same bracket also holds four separate Murata DC-DC converter modules for the 5 volt subsystems. It was nice to have separate overcurrent protection for the microcontroller, the infrared sensors, and the groups of LEDs.

The bot draws around 1.5 amps at 12 volts over this 100 feet of speaker cable in the leash, which seemed like the right tradeoff between complexity and efficiency for this design. The leash weight isn't critical, or I'd be more interested in Power over Ethernet. In any case, the bot needs a separate coaxial cable for the uncompressed HD video.

This cable leads back to a Blackmagic capture card on the PC which is responsible for live-streaming as well as running the Bot-Controller software and the computer vision.

This PC has a very nice GPU which spends about half its time running the object detection algorithm, but it also helps out with video encoding.

The leash securely but softly mounts to the carbon fiber rods using this combination of a Ninjaflex ring plus a two-piece PLA clamp.

Now the bot components all stack, bolt, and press together. Right now the wires and internals are exposed, but there's room to design a cover, probably a plastic or paper sheet with a holder that can snap onto the exposed composite rods.

[ Assembly montage ]

The completed bot weighs 1.3 kilograms, a lot of that coming from the metal in the camera and gimbal assembly.

While reverse-engineering and modifying the gimbal firmware, we used Python to speak its serial protocol. This code needed to be rewritten in Rust and integrated with the Bot-Controller software.

The controller implements all of the robot's behavior, including the object tracking we're adding in this episode. It communicates with the microcontrollers over UDP and with the rest of the world over WebSockets and HTTP.

With the QR code it generates, I can log into a web interface with my phone, which helps for configuration and control.

For the new computer vision features, I could have built a standalone program maybe, or I could have tried to get all of the libraries I needed working with Rust and built something directly into the Bot-Controller, but the thing that seemed both easier and more useful, to me, was to make a plugin for the open source video stre aming software I like to use anyway.

These edited videos are condensed down from dozens of scanlime-in-progress livestreams, and OBS, or Open Broadcaster Software, is what I've been using to produce those streams.

OBS can handle input and output and video encoding, and this obs-TucoFlyer filter plugin can use the GPU to capture lower-resolution snapshots of each frame for the computer vision algorithms to work on, and then the filter can render overlays on top of the video according to a layout provided by the Bot-Controller.

The earliest code you're seeing here was interacting with video while it was still in system memory, but the current code uses the GPU to do a separate high-quality downsample operation for the resolution needed by each computer vision algorithm.

Before too long I had the object detector wired up to OBS, but I was stuck with printf debugging. I needed a way to put text on the screen. I decided there should be a single texture atlas with all the characters and symbols I need, and the Bot-Controller would send coordinates that become a list of quadrilaterals to render. I used the BMFont tool to start my texture atlas, and there was even a Rust library already for plotting text using the metadata from BMFont.

On the OBS side, the whole overlay can be drawn efficiently with a single vertex buffer. On the Rust side, I have a simple library of drawing functions that result in a list of quadrilaterals broadcast to the WebSocket clients.

Back to computer vision, one type of tracking I could have used is a sparse optical flow algorithm like OpenCV's KLT tracker shown here. It works well even with very low-end CPUs when you're tracking few enough points. With a GPU we could upgrade to dense optical flow, where we get motion vectors for each pixel. But better yet, we could use a correlation tracker to look for the best match for our region of interest from frame to frame.

So that's what you're seeing here; it isn't programmed to find me specifically, it's just looking for a match to its evolving reference image.

To find Tuco, I could have used a simpler computer vision algorithm that only detects one class of object. But I've been excited about this YOLO algorithm; You Only Look Once. It's a relatively new neural network architecture which looks at the image once and converts it directly to rectangular bounding boxes for objects of many classes. Finally it's possible to let a computer "see" many types of objects in real-time, using "only" the processing power in a high-end GPU. This seems like it could enable fun new ways for viewers to interact with Tuco and the robot.

I eventually want to train my own network which detects cats and objects specific to my shop, but right now I'm using the pre-trained YOLOv2 network based on the COCO image dataset. This is where the object labels come from, and the bot's current idea of what those objects look like. It's actually sourced from a lot of random people's Flickr photos, check out the paper.

The object detector generates a lot of rectangles of varying quality. The visual representation for those rectangles evolved, and eventually I started giving the better predictions thicker bounding rectangles, and I added text labels to rectangles above a set quality threshold. These predictions are jumpy, since they're a direct reflection of the neural network's output for each frame. This is intended to be debug output, so it was more useful to see the raw results of the algorithm than to try and smooth out these particular rectangles.

The tracker and detector algorithms are working together here. The region of interest is shown by a yellow rectangle at this point, and it's updated every frame by the correlation tracker. You'll also see me steer this rectangle manually with a gamepad sometimes. The object detector also runs on every frame, but it may or may not see anything worth tracking.

If you can read the jumpy text on the region of interest, you'll see values labeled "PSR", "age", and "area". PSR stands for Peak to Side lobe ratio, which is the quality metric generated by our correlation tracker as it evaluates pixel matches in the frequency domain. Higher numbers mean the correlation tracker has more high-contrast features to lock onto. Age is the number of frames since the tracker was reset to a new reference location by the object detector or the manual joystick controls. And area is just the size of the rectangle in the software's unit convention.

The pre-trained YOLO model has been doing pretty well at detecting Tuco, but this toy rat has been a reliable source of false positives. Until I get around to customizing the model, I'm keeping the rat out of sight.

The tracking algorithm has been evolving from a single PID controller which tried to keep the region of interest centered, to the current system which uses several PID controllers tied to the horizontal and vertical margins of the screen. This has been an attempt at allowing more of a rule-of-thirds framing as well as some hysteresis so Tuco can move around the frame. Keeping up with him is still a work in progress, but it's been especially hard now that the bot isn't quite mobile yet.

The jittery rectangles are okay for debug, but I also wanted a less visually noisy way to communicate the region of interest rectangle. I could have put a low-pass filter on its position, smoothing it out. But a naive filter would have made the movement appear laggy. Instead of focusing on filtering here, I thought it would be nice to try using that jittery data to drive a particle system which generates less visual noise than the rectangles. So, that's how we have cat-sprites now circling the region of interest.

The very simple particle dynamics are implemented on the Rust side, and the sprites render using the same system responsible for text and rectangles. We do have plenty of GPU power for visual effects if we wanted to add some shaders to the OBS plugin, but I didn't quite reach that point.

We've been talking about my cat Tuco, but you may have noticed some guest cats in this episode and the last one. I thought I could help out these two critters who needed a home for a couple months. This black cat is Luna, and the fuzzy marshmallow is Cloud. Unfortunately Tuco and them never got along well, and there were other complications. Cloud broke his leg somehow, and this is Cloud again, almost managing to jump out the window that's just off the top of the frame. So we were all relieved when they finally flew back to their real mom on a tiny plane.

And that's about it for this episode. The next major milestone will be getting all the winches holding the Flyer and start to add 6-degree-of-freedom motion into this setup. I just printed the parts for a counterweight to hold the leash out of the way, and there's more software work to do to so we can  make the LEDs and sensors useful.

Thank you for watching, and a special thanks to everyone who supports this series via Patreon, Liberapay, or by sharing these videos with your friends. It's a huge amount of work to do everything myself from the mechanical to the software to the video and audio, and this series is truly made possible by your support, and even donating a dollar a month or telling one friend who you think would enjoy this content, it helps so much!

The software and CAD designs are open source, and you can find the github link in the video description. And if you still want more, check out the scanlime-in-progress livestreams on my channel. If you click the YouTube notification bell or follow @ScanlimeLive on twitter you'll find out about them as they happen.

Happy hacking, folks!