# The TETRISC SoC - A Robust Quad-Core High Resilience System

Junchao Chen, Rizwan Tariq Syed, Marko Andjelkovic, Lara Wimmer, Eckhard Grass, Markus Ulbricht, Milos Krstic IHP, Im Technologiepark 25, Frankfurt Oder, Germany

## MOTIVATION

- Increasing demand for real-time data processing in reliability-critical applications, such as aviation and aerospace.
- Overcoming limitations of traditional static fault mitigation methods.
- Requirements for real-time reliability monitoring networks.
- Addressing the dynamic reliability needs of systems to ensure optimal operation under normal and severe conditions.

## GOALS

- Develop an adaptable and resilient system for reliability-critical applications.
- Enable on-demand reconfigurable redundant system allocations under harsh conditions.
- Implement an on-board monitor network for enhanced system monitoring.
- Realize real-time dynamic tradeoffs between system reliability, power consumption and performance.

**TETRISC SoC Overview** 



- Reconfigurable quad-core SoC based on the open-source single-core microcontroller architecture PULPissimo, utilizing the RISC-V instruction set.
- Integration of multiple on-chip monitors for SEU (radiation), core aging, and temperature monitoring.
- Dynamic tradeoffs between reliability, performance, and power consumption.
- Intelligent HiRel framework controller for hybrid critical edge computing applications.
- Fabricated with 130nm IHP technology.
- Demonstrators available in both ASIC and FPGA implementations.

| Chip area               | 43,56 <i>mm</i> <sup>2</sup> (6.6*6.6mm) |  |  |
|-------------------------|------------------------------------------|--|--|
| Nominal clock frequency | 30MHz                                    |  |  |
| Power consumption       | <1W (estimated)                          |  |  |
| Memory                  | 4*8192*40 Bit SRAM                       |  |  |
| Pads                    | 81 signal, 35 other                      |  |  |
|                         |                                          |  |  |

#### **Adaptiveness and Fault Tolerance**

- HiRel framework controller can ,anages all core inputs and outputs, implementing various operation modes with N-Modular Redundancy (NMR) and clock gating.
- Includes a binary matrix-based programmable NMR majority voter that provides dynamic selection.
- Three operating mode groups: high performance, power saving, and fault tolerance.





- Various user-defined and self-triggered fault-tolerant modes with the reliable monitor network.
- Task synchronization between different cores can be achieved in two clock cycles.
- Protection of all components outside the core through the use of TMR flip-flops.
- Operation modes can be determined from the pre-defined reliability thresholds.

## FPGA Demonstrator with Fault Injection & Operation Mode Reconfiguration

- The quad-core system, integrated with a fault injection model, is realized on an ARTIX-7 FPGA using an IHP-designed board.
- A fault injection model is embedded within the cores at the RTL level to simulate core error behavior.
- The system includes three switches for active DMR, TMR, and QMR modes, and individual switches to trigger faults in each core.
- The LCD display exhibits the outputs of the four cores, the status of the input switches, and the operational status of the cores.
- In the default mode, when a fault is injected, the affected core halts.
- In DMR mode, faults can be detected as they occur, but the system



|   | Core 3 | C2 | <b>C1</b> | С0 |
|---|--------|----|-----------|----|
| 0 | 0      | 0  | 0         | 0  |
| 1 | 0      | 0  | 0         | 1  |
| 2 | 0      | 0  | 1         | 0  |
|   |        |    |           |    |
| 9 | 1      | 0  | 0         | 1  |
| : | 1      | 0  | 1         | 0  |
| ; | 1      | 0  | 1         | 1  |
| < | 1      | 1  | 0         | 0  |
| = | 1      | 1  | 0         | 1  |

- is unable to determine the majority of outputs.
- TMR mode allows for the detection and selection of majority outputs when faults occur in a single core.
- In QMR mode, the system may detect and select the majority outputs even when two cores have errors.

↑ It is the block diagram of the HiRel Framework Controller. The LCD provides real-time updates on the operation of the quad-core system. Typically, the four cores execute distinct tasks and display as C0, C1, C2, and C3 respectively. Multiple switches facilitate fault injection and mode reconfiguration. The blue block indicates switch control for the board, while the red block depicts the operational status of the cores.

| > | 1 | 1 | 1 | 0 |
|---|---|---|---|---|
| ? | 1 | 1 | 1 | 1 |

 $\uparrow$  The meaning of the display on the LCD.

This work has received funding from the European Union's Horizon 2020 research and innovation programme under the Marie Skłodowska-Curie grant agreement No. 72232, and the German Federal Ministry for Education and Research through the Open6G-Hub project (Grant no. 16KISK009) as well as the Scale4Edge project (Grant no. 16ME0134).



<sup>↑</sup> Example of the mode configuration. Core 0, 1 and 2 form the core-level TMR