Initial hypothesis and general objectives
The general objective of this project is the efficient implementation of vision processing for the IoT, given the strong constraints on energy and computing resources that are present in IoT platforms. Our initial hypothesis is that a successful approach needs to be based on re-partitioning the embedded vision system in the direction of solving the power-performance trade-off. A feasible approach is the development of the Analog-to-Information system model, that combines feature-enhancing filtering in the analog domain and hardware-aware high-level inference engines.
In order to achieve this goal, we need to work in different lines. First, bioinspired focal-plane processing models need to be explored. In parallel, the implementation of in-pixel and off-array functionalities need to be studied. Simulation of the embedded vision system needs to be multilevel, based on interoperable descriptions, to allow joint-optimization at the different levels of the hierarchy. The analysis of feature classification and machine learning algorithms needs to be connected to their hardware implementation. System-level aspects need to be considered as well, for a successful system verification, integration and validation in a relevant application scenario. These research lines will be organized in pursuit of the set of partial objectives described in the next section.
Specific objectives
Specific objectives of IMSE’s project
O1-1: High-dynamic-range single-shot image sensor, including auto-exposure and tone mapping functionalities at focal plane. To the best of our knowledge, no academic or commercial image sensor integrates all these characteristics. This sensor will employ two intertwined photo-diodes per pixel and minimum extra circuitry – apart from the elements for readout – in order to achieve at least VGA image resolution (640x480 px) and a pixel pitch below 8 µm. It will be designed, if possible, in a CMOS image sensor technology such as LFoundry 110 nm.
O1-2: Design of a hardware-aware algorithm based on compressed learning. This algorithm will be tightly integrated with the chip described in the following objective. The algorithm will implement face recognition as a case study of particular interest for IoT devices in secured environments. We aim at distinguishing at least 50 people with different backgrounds, poses, and illumination conditions.
O1-3: Smart image sensor generating compressed samples. This sensor will be employed to recognize faces in cooperation with the algorithm described in the previous objective. This sensor will have at least QVGA resolution (320x240 px) and will produce compressed samples at an equivalent rate of 30 fps. It will be designed, if possible, in a CMOS image sensor technology such as LFoundry 110 nm.
O1-4: Implementation of a smart camera trap with network connectivity, capable of recognizing at least three different species with high reliability (top-1 accuracy greater or equal to 80%) in daylight and night-time conditions at remote locations. A major focus will put in energy consumption. The camera is expected to operate on battery with no maintenance for at least one month.
Specific objectives of USC’s project
O2-1: Visual Object Detection and Tracking on Embedded Devices. The former project in the Spanish I+D+i call- RTI2018-097088-B- focused on embedded GPU’s, with power consumption in the 10W-20W range. The current project aims at FPGAs, where the goal is to run object detection and tracking deep learning algorithms with less than 5 W at video rate processing.
O2-2: Mixed-Mode On-Chip CMOS Hyperdimensional Computing. Implementation of on-chip primitives of hyperdimensional computing, i.e., binding, bundling and permutation, as well as classification and hyperdimensional encoding for event sensors. The implementation will be made in high voltage 0.18 um CMOS technology, seeking for compatibility with FeRAM technology. The goal is to run on-chip hyperdimensional computing with hypervectors of at least 8192 components, and meet video rate processing with power consumption of 100 mW.
O2-3: On-chip Event Generation and Time-of-Flight Sensors. Implementation of on-chip synchronous event generation and time-of-flight sensors with multiple frequency measurements to solve depth ambiguities or cope with multipath interference. Both designs will be based on 4T-APS CIS technology to minimize noise. We pursue a chip for synchronous event generation with a dynamic range of at least 100 dB with less than 100 nW/px and at least 1000 events per second. The goal in the ToF chip is to have 3D reconstruction with at least 16 different simultaneous frequencies.
O2-4: Strategies for low-power circuit design. This is a transversal objective to be pursued within O2-1-O2-3 objectives. Throughout the design hierarchies (FPGA and custom-made mixed-mode CMOS design) special emphasis will be made on reducing the overall power consumption of the devices both at low (power-gating, low-power design) and high level (smart memory access and control strategies).
Specific objectives of UPCT’s project
O3-1: Development of a methodology to evaluate at architectural level the performance and dataflow of a mixed-signal convolutional neural network, combining Spice, Verilog–A and functional models. This methodology will be used to design, at this level, a scalable small size neural network able to classify images between 8x8 and 16x16 pixels beyond real time operation. The network will be tested againts resized images of known databases such as MNIST hadwritten characters or CIFAR-10.
O3-2: Prototyping an in-memory computing array based on 180 nm standard CMOS technology. The memory grid will be based on modified SRAM cells and will have a size of 9x4 bit cells in order to implement convolution of 3x3 masks with weights up to 4 bits of quantization.
O3-3: Implementation of drivers and readout circuits required to provide functionality and reconfigurability to the computing grid. These circuits will manage data reuse or configure the number of filter channels and the weights quantization, implement batch normalizations, select differential or single-ended digital or analog outputs, etc.
Q3-4: Prototyping a CMOS convolutional neural network based on the design from objective Q3-1. To obtain an affordable chip size in 180 nm standard CMOS technology, the network will work with low resolution 8x8 px input images. It will infere single objects from a resized version of known databases like MNIST or CIFAR-10 beyond real time operation.