Functional and dynamic programming in the design of parallel prefix networks

MARY SHEERAN

doi:10.1017/S0956796810000304

Functional and dynamic programming in the design of parallel prefix networks

Part of: JFP Research Articles

Published online by Cambridge University Press: 06 December 2010

MARY SHEERAN

Show author details

MARY SHEERAN*: Affiliation:
CSE Department, Chalmers University of Technology, Göteborg, SE-41296, Sweden (e-mail: ms@chalmers.se)

Article contents

Abstract
References

Rights & Permissions

Abstract

Core share and HTML view are not available for this content. However, as you have access to this content, a full PDF is available via the ‘Save PDF’ action button.

A parallel prefix network of width n takes n inputs, a1, a2, . . ., an, and computes each yi = a1 ○ a2 ○ ⋅ ⋅ ⋅ ○ ai for 1 ≤ i ≤ n, for an associative operator ○. This is one of the fundamental problems in computer science, because it gives insight into how parallel computation can be used to solve an apparently sequential problem. As parallel programming becomes the dominant programming paradigm, parallel prefix or scan is proving to be a very important building block of parallel algorithms and applications. There are many different parallel prefix networks, with different properties such as number of operators, depth and allowed fanout from the operators. In this paper, ideas from functional programming are combined with search to enable a deep exploration of parallel prefix network design. Networks that improve on the best known previous results are generated. It is argued that precise modelling in a functional programming language, together with simple visualization of the networks, gives a new, more experimental, approach to parallel prefix network design, improving on the manual techniques typically employed in the literature. The programming idiom that marries search with higher order functions may well have wider application than the network generation described here.

Type: Articles
Information: Journal of Functional Programming , Volume 21 , Issue 1 , January 2011 , pp. 59 - 114

DOI: https://doi.org/10.1017/S0956796810000304 [Opens in a new window]
Copyright: Copyright © Cambridge University Press 2010

References

Antoy, S. & Hanus, M. (2010) Functional logic programming, Commun. ACM, 53 (4), 74–85.CrossRef Google Scholar

Axelsson, E., Björk, M. & Sheeran, M. (2005) Teaching hardware description and verification. In International Conference on Microelectronic Systems Education, MSE. IEEE, pp. 119–120.Google Scholar

Axelsson, E. (2008) Functional Programming Enabling Flexible Hardware Design at Low Levels of Abstraction. Ph.D. thesis, Chalmers University of Technology.Google Scholar

Axelsson, E., Dévai, G., Horváth, Z., Keijzer, K., Lyckegård, B., Persson, A., Sheeran, M., Svenningsson, J. & Vajda, A. (2010) Feldspar: A domain specific language for digital signal processing algorithms. In Proceedings of the Eighth ACM/IEEE International Conference on Formal Methods and Models for Codesign, MemoCode. IEEE Computer Society, pp. 169–178.Google Scholar

Bjesse, P., Claessen, K., Sheeran, M. & Singh, S. (1998) Lava: Hardware design in Haskell. In International Conference on Functional Programming, ICFP. ACM, pp. 174–184.Google Scholar

Blelloch, G. E. (1990) Prefix Sums and Their Applications. Tech. rept. CMU-CS-90-190. School of Computer Science, Carnegie Mellon University. Also appears in Synthesis of Parallel Algorithms, Reif (ed.), Morgan Kaufmann, 1993.Google Scholar

Brent, R. P. & Kung, H. T. (1982) A regular layout for parallel adders, IEEE Trans. Comput., C-31, 260–264.Google Scholar

Chan, P. K., Schlag, M. D. F., Thomborson, C. D. & Oklobdzija, V. J. (1992) Delay optimization of carry-skip adders and block carry-lookahead adders using multi-dimensional dynamic programming, IEEE Trans. Comput., 41 (8), 920–930.CrossRef Google Scholar

Claessen, K., Sheeran, M. & Singh, S. (2001) The design and verification of a sorter core. In Correct Hardware Design and Verification Methods, CHARME. Lecture Notes in Computer Science, vol. 2144. Springer, pp. 355–369.CrossRef Google Scholar

Cormen, T. H., Leiserson, C. E, Rivest, R. L. & Stein, C. (2001) Introduction to Algorithms. 2nd ed.Cambridge, MA: MIT Press.Google Scholar

Fich, F. E. (1982) Two Problems in Concrete complexity: Cycle Detection and Parallel Prefix Computation. Ph.D. thesis, University of California, Berkeley.Google Scholar

Fich, F. E. (1983) New bounds for parallel prefix circuits. In STOC '83: Proceedings of the Fifteenth Annual ACM Symposium on Theory of Computing. ACM Press, pp. 100–109.CrossRef Google Scholar

Franchetti, F., de Mesmay, F., McFarlin, D. & Püschel, M. (2009) Operator language: A program generation framework for fast Kernels. In Proceedings of IFIP Working Conference on Domain Specific Languages (DSL WC). Lecture Notes in Computer Science, vol. 5658. Springer, pp. 385–410.Google Scholar

Giegerich, R., Meyer, C. & Steffen, P. (2002) Towards a discipline of dynamic programming. In Informatik bewegt: Informatik 2002–32. Jahrestagung der Gesellschaft für Informatik e.v. (gi). Lecture Notes in Informatics. Bonner Köllen Verlag, pp. 3–44.Google Scholar

Gill, A., Bull, T., Kimmell, G., Perrins, E., Komp, E. & Werling, B. (2010) Introducing Kansas Lava. In Proceedings of the 21st Symposium on Implementation and Application of Functional Languages, IFL'09. Lecture Notes in Computer Science, vol. 6041. Springer, pp. 18–35.CrossRef Google Scholar

Han, T. & Carlson, D. (1987) Fast area-efficient VLSI adders. In Proceedings of International Symposium on Computer Arithmetic. IEEE, pp. 49–56.Google Scholar

haskell.org. (2009) The web page gathers information about Haskell, compilers, tutorial materials, packages and much more.Google Scholar

Hinze, R. (2000) Memo functions, polytypically! In Proceedings of the Second Workshop on Generic Programming, WGP 2000, Jeuring, J. (ed), pp. 17–32.Google Scholar

Hinze, R. (2004) An Algebra of scans. In Mathematics of Program Construction. Lecture Notes in Computer Science, vol. 3125. Springer, pp. 186–210.CrossRef Google Scholar

Jones, G. & Sheeran, M. (1990) Circuit design in Ruby. In Formal Methods for VLSI Design, Staunstrup, J. (ed). North-Holland, pp. 13–70.Google Scholar

Knowles, S. (1999) A family of adders. In Proceedings of International. Symposium on Computer Arithmetic. IEEE Press, pp. 277–284.Google Scholar

Kogge, P. M. & Stone, H. S. (1973) A parallel Algorithm for the efficient solution of a general class of recurrence equations, IEEE Trans. Comput., C-22 (8), 786–793.CrossRef Google Scholar

Ladner, R. E. & Fischer, M. J. (1980) Parallel prefix computation, J. ACM, 27 (4), 831–838.CrossRef Google Scholar

Lakshmivarahan, S., Dhall, S. K. & Yang, C.-M. (1987) On a new class of optimal parallel prefix circuits with (Size+Depth) = 2n−2 and ⌈logn⌉ ≤ depth ≤ (2⌈logn⌉ −3). In Proceedings of International Conference on Parallel Processing. Pennsylvania State University Press, pp. 58–65.Google Scholar

Lin, Y.-C. & Hung, L.-L. (2009) Straightforward construction of depth-size optimal, parallel prefix circuits with fan-out 2, ACM Trans. Des. Autom. Electron. Syst., 14 (1), 15:1–15:13.CrossRef Google Scholar

Lin, Y.-C., Hsu, Y.-H, & Liu, C.-K. (2003) Constructing H4, a fast depth-size optimal parallel prefix circuit, J. Supercomput., 24 (3), 279–304.CrossRef Google Scholar

Lin, Y.-C. & Liu, C.-K. (1999) Finding optimal parallel prefix circuits with fan-out 2 in constant time, Inf. Process. Lett., 70 (4), 191–195.CrossRef Google Scholar

Lin, Y.-C. & Su, C.-Y. (2005) Faster optimal parallel prefix circuits: New algorithmic construction, J. Parallel Distrib. Comput., 65 (12), 1585–1595.CrossRef Google Scholar

Liu, J., Zhu, Y., Zhu, H., Cheng, C.-K. & Lillis, J. (2007) Optimum prefix adders in a comprehensive area, timing and power design space. In ASP-DAC'07: Proceedings of the 2007 Asia and South Pacific Design Automation Conference. Washington, DC, USA: IEEE Computer Society, pp. 609–615.Google Scholar

Martel, C., Oklobdzija, V. G., Ravi, R. & Stelling, P. (1995) Design strategies for optimal multiplier circuits. In Proceedings 12th IEEE Symposium on Computer Arithmetic. IEEE, pp. 42–49.CrossRef Google Scholar

Naylor, M. (2008) Hardware-Assisted and Target-Directed Evaluation of Functional Programs. Ph.D. thesis, University of York.Google Scholar

Naylor, M., Axelsson, E. & Runciman, C. (2007) A functional-logic library for wired. In Proceedings of the ACM SIGPLAN Haskell Workshop, pp. 37–48.CrossRef Google Scholar

Pippenger, N. (1987) The complexity of computations by networks, IBM J. Res. Dev. 31 (2), 235–243.CrossRef Google Scholar

Püschel, M., Moura, J. M. F., Johnson, J., Padua, D., Veloso, M., Singer, B., Xiong, J., Franchetti, F., Gacic, A., Voronenko, Y., Chen, K., Johnson, R. W. & Rizzolo, N. (2005) SPIRAL: Code generation for DSP transforms. Proceedings of IEEE, Special Issue on Program Generation, Optimization and Adaptation, 93 (2), 232–275.Google Scholar

Sheeran, M. (2003) Finding regularity: Describing and analysing circuits that are almost regular, In Correct Hardware Design and Verification Methods, CHARME. Lecture Notes in Computer Science, vol. 2860. Springer, pp. 4–18.CrossRef Google Scholar

Sheeran, M. (2004) Generating fast multipliers using clever circuits. In Formal Methods in Computer-Aided Design, FMCAD. Lecture Notes in Computer Science, vol. 3312. Springer, pp. 6–20.CrossRef Google Scholar

Sheeran, M. & Parberry, I. (2006) A New Approach to the Design of Optimal Parallel Prefix Circuits. Tech. rept. 2006:1. Chalmers: Department of Computer Science and Engineering.Google Scholar

Singh, S. (1992) Circuit analysis by non-standard interpretation. In Designing Correct Circuits. IFIP Transactions, vol. A-5. North-Holland, pp. 119–138.Google Scholar

Singh, S. (2000) Death of the RLOC? In FPGAs for Custom Computing Machines (FCCM). IEEE Computer Society Press, pp. 145–152.Google Scholar

Sklansky, J. (1960) Conditional-sum addition logic, IRE Trans. Electron. Comput., EC-9, 226–231.CrossRef Google Scholar

Snir, M. (1986) Depth-size trade-offs for parallel prefix computation. J. Algebra, 7 (2), 185–201.Google Scholar

Svensson, J., Sheeran, M. & Claessen, K. (2010) GPGPU Kernel Implementation and Refinement using Obsidian. In Proceedings of the Seventh International Workshop on Practical Aspects of High-level Parallel Programming, ICCS. Procedia, pp. 2059–2068.Google Scholar

Voigtländer, J. (2008) Much ado about two: A pearl on parallel prefix computation. In Proceedings of the 35th Symposium on Principles of Programming Languages, Wadler, P. (ed), SIGPLAN Notices, vol. 43, no. 1. ACM Press, pp. 29–35.Google Scholar

Vuillemin, J. (2006) Use of dynamic programming to find best topology for given technology for 64 bit adder, work done at Digital in 1992. (private communication).Google Scholar

Wadler, P. (1992) Monads for functional programming. In Proceedings of the Marktoberdorf Summer School on Program Design Calculi, vol. 118. Springer-Verlag, NATO ASI Series F: Computer and systems science.Google Scholar

Zhu, H., Cheng, C.-K. & Graham, R. (2006) On the construction of zero-deficiency parallel prefix circuits with minimum depth, ACM Trans. Des. Autom. Electron. Syst., 11 (2), 387–409.CrossRef Google Scholar

Submit a response

Discussions

No Discussions have been published for this article.

Article contents

Functional and dynamic programming in the design of parallel prefix networks

Abstract

References

Discussions

Save article to Kindle

Save article to Dropbox

Save article to Google Drive

Reply to: Submit a response

Your details

You have entered the maximum number of contributors

Conflicting interests