

- map.
- feature map.
- both rows and columns after each value.
- Adversarial Networks (GANs).



kernels.

convolution operations.

represent active elements.



Fig: Computation pattern for Transpose Convolution



Output feature map from the I transpose convolution operation with stride 2

Output feature map from the transpose convolution operation with stride 2

0.49

Proposed/

4 outputs

| pgy            |            |
|----------------|------------|
| a (cell units) | Power (mW) |
|                |            |
| 29413.37       | 19.23      |
| 29019.63       | 19.91      |
| 2010.00        | 10.01      |
|                | I          |
| 54174.12       | 31.90      |
|                |            |
| 51217.06       | 37.57      |
|                |            |
| 78509.66       | 46.48      |
|                |            |
| 71270.24       | 56.38      |
|                |            |
| ogy            |            |
| 3105.55        | 2.93       |
| 5105.55        | 2.55       |
| 3070.52        | 2.96       |
|                |            |
|                |            |
| 5835.89        | 5.19       |
| 5645.68        | 5.70       |
| 5045.00        | 5.70       |
|                | L          |
| 8966.66        | 7.77       |
|                |            |
| 8549.79        | 8.31       |
|                |            |

- and the kernel size of 32 bits.
- delay, area and power consumption.
- generating four pixels.
- output feature map.
- devices.

- output feature is of odd dimensions.
- computations.

- Implement the performance.



**Discussion and Limitations** 

• The delay, area and power requirements are noted for 45nm and 14nm technology nodes using Synopsys DC Compiler. • The implemented design considers the input size of 8 bits

• The proposed method showed more efficiency in terms of

• The power consumption for the proposed method is for

• The average power consumption is very low when compared to the original method for writing a single element in the

• The proposed method reduces the computation load up to nearly 3.8× compared to the original implementation.

• The proposed method helps to scale the existing deep learning models having TC to implement on handheld

# Limitations

The proposed method produces four pixels at a time, which results in computing the unwanted elements if the

Thus, the extra elements needed to be avoided for future

# **Future Works**

optimized transpose proposed convolution operation on different Field Programmable Field Arrays (FPGAs) from Intel and AMD companies. Also, design the simple neural network which uses transpose convolution layer on FPGAs and analyze the

## References

• Tida, V. S., Chilukoti, S. V., Hsu, S. H. Y., & Hei, X. (2023). Kernelsegregated transpose convolution operation. In T. X. Bui (Ed.), 56th hawaii international conference on system sciences, HICSS 2023, maui, hawaii, USA, january 3-6, 2023 (pp. 6934–6943). ScholarSpace.https://hdl.handle.net/10125/103035

Yazdanbakhsh, A., Samadi, K., Kim, N. S., & Esmaeilzadeh, H. (2018). Ganax: A unified mimd-simd acceleration for generative adversarial networks. 2018 ACM/IEEE 45<sup>th</sup> Annual International Symposium on Computer Architecture (ISCA), 650–661.