Commsdesign Home Register About Commsdesign Feedback Online Opportunities SpecSearch GlobalSpec




















eLibrary

EE TIMES NETWORK
 Online Editions
 EE TIMES
 EE TIMES ASIA
 EE TIMES CHINA
 EE TIMES FRANCE
 EE TIMES GERMANY
 EE TIMES INDIA
 EE TIMES JAPAN
 EE TIMES KOREA
 EE TIMES TAIWAN
 EE TIMES UK

 EE TIMES EUROPE
 ANALOG EUROPE
 INDUSTRIAL EUROPE
 AUTOMOTIVE DL EUROPE

 POWER DL EUROPE

 Web Sites
 • Audio DesignLine
 • Automotive DesignLine
 • Career Center
 • CommsDesign
 • Microwave
    Engineering
 • Deepchip.com
 • Design & Reuse
 • Digital Home DesignLine
 • DSP DesignLine
 • EDA DesignLine
 • Embedded.com
 • Elektronik i Norden
 • Green SupplyLine
 • Industrial Control
    DesignLine
 • Planet Analog
 • Mobile Handset
    DesignLine
 • Power Management
    DesignLine
 • Programmable Logic
    DesignLine
 • RF DesignLine
 • RFID-World
 • Techonline
 • Video | Imaging
    DesignLine
 • Wireless Net
    DesignLine

ELECTRONICS GROUP SITES

 • eeProductCenter
 • Electronics Supply &
    Manufacturing
 • Conferences
    and Events
 • Electronics Supply &
    Manufacturing--China
 • Electronics Express
 • Webinars


19 November 2008

DSP Buyer’s Guide

HTML format
PDF format
Vendor List

While DSPs have become common tools in many comm-developer’s bag of tricks, selecting the right DSP for a given project is no simple matter. The increasing number of DSP product offerings has brought with it a wider variety of DSP architectures – complicating the product selection task.

By Henry Davis

Digital signal processors (DSPs) have become pervasive in modern electronic design. Where once only a select group of engineers had the necessary skills to apply this advanced technology, today tens of thousands of hardware and software professionals have the knowledge needed to make digital signal processing a part of their companies’ product plans. These skills have been acquired after decades of seminars and workshops offered by the major DSP semiconductor suppliers and an ever-growing cadre of independent training companies. Factory-sponsored workshops originally offered generic DSP training with a modest amount of company-specific information. Now, many of these free or relatively low cost workshops are heavily applications-focused with significant product-related content. This shift in workshop content reflects not only the dramatically increased financial importance of digital signal processing to the semiconductor industry, but also the tremendous increase in DSP product complexity. It is no longer practical to only learn the basics of digital signal processing algorithms, and then learn the architecture of high performance DSPs on the job. The learning curve for engineers using an advanced DSP for the first time is significant — and can require more effort than that demanded by the more complex general-purpose microprocessor.

Ruthless, experienced programmers with digital signal processing knowledge used to be able to learn first-generation DSP tricks in a few months. But the increased complexity of today’s high-performance DSPs may demand more than 6 months of hands-on practice before those same programmers are proficient. This increased requirement for experience with the processor is due, in large part, to the fact that the underlying architectures are different from those taught in academic courses. With a few exceptions, programmers don’t usually get the opportunity to experience these alternative architectures until they begin work on an advanced DSP project as an employee. Experience-for-hire now plays a bigger role in bootstrapping engineers than it has in the past. A small but important number of former factory applications engineers offer consultation services as independent professionals, bringing both engineering capacity and capability to their clients. For some companies, choosing a specific, advanced DSP depends on the availability of experts to assist during the first design.

As fast as digital signal processing has grown in use, a vast untapped market of engineering professionals who have related skills but lack the experience of developing digital signal processing-based products remains. One of the major successes of digital signal processing has been the strong identification of the DSP as a processor type. This has given engineers a focus and definition to work within as they gather critical skills required for implementing new and more complex systems. While the identification of the DSP as a processor type has benefited many, it serves to obscure the underlying technology — digital signal processing is about algorithms and processes that can be applied to specific types of real-world data. Digital signal processing can be performed by any processor — it’s just a matter of bandwidth capabilities.

Defining bandwidth requirements

Developers of DSP-based integrated circuits once pushed new product plans ahead by adhering to a simple philosophy: There’s never enough digital signal processing power. Developers could be reasonably assured a successful product by simply turning the crank to create ever faster processors. As quickly as developers could create faster DSPs, engineers would clamor for more performance. As standards-based products like cell phones, modems, video codecs, and audio processors begin to mature and become mass produced, DSP capabilities can be grouped into applications-related performance bands. For example, the first-generation GSM voice codec could be implemented in less than 10 million DSP instructions per second (DSP MIPS). The half-rate GSM coder required less than 33 DSP MIPS. These two applications create natural partitions in the performance spectrum. Mass-produced DSPs for full-rate GSM must meet the requirements for performance, but should not exceed the requirements by a substantial margin. The first application-based partitioning of performance requirements has continued to be a factor for developing new DSPs.

Even though DSP performance can be defined by application requirements that limit the need for ultimate performance, application and system designs still remain that demand the maximum performance possible.

Competing architectures

Dichotomies between performance potential, application needs, speed, power, price, and capability all serve to open the field of DSP architecture to a wider group of suppliers. With this increase in the number of suppliers comes a greater diversity of architectures. Each company brings its past experience to bear on a DSP architecture. As the company gains experience in DSP-based solutions, its product offerings will migrate. Texas Instruments’ (TI’s) TMS320C5xx families illustrate this trend. The original TI DSP designed in 1982 was a modified Harvard architecture that physically and logically split program and data memory into separate addressing spaces. The modified portion of the architecture refers to the addition of a bus interchange module that permits limited data exchanges between the program and data memories. Pure Harvard architectures maintain a complete separation of the two spaces. For the TMS32010 DSP, the ability to transfer data values from program to data memory is critical. Program memory for mass-produced parts is ROM, just as TI has employed in their TMS1000 4-bit microcontroller. Since coefficients do not change, the obvious place to store them is in the ROM space. This and other considerations, led to departures from the pure Harvard philosophy.

The TMS32010 led to the second-generation TI DSP — the TMS320C25, a part still in production. The ‘C25 added some instructions and modified the peripherals based on experience with general-purpose DSP applications. The next change in the basic TI architecture was fueled by a short-lived relationship with Intel. As part of an ASIC agreement between TI and Intel, TI undertook a design program that would permit both companies to standardize on the ‘C25’s architecture. TI’s experience in developing customized DSPs for large customers prompted them to restructure the physical layout of the ‘C25 to include a specialized peripheral bus and full JTAG for testing. Intel’s experience in the controller market led them to lobby successfully for the inclusion of bit-manipulation instructions to the DSP instruction set. These two different experiences from TI and Intel shaped the TMS320C50 into a more capable general-purpose part, while maintaining a significant source code compatibility with previous generations of the architecture.

Success breeds specialization

According to Will Strauss, president of Tempe, AZ-based Forward Concepts, since he began publishing his DSP market and strategy study over a decade ago, TI has maintained the lead in DSP pro-duct sales. During the early 1990s, TI’s largest applications involved motor control for hard disk drives and for data modems. Both of these applications demanded variations of the standard product offering, to improve performance and reduce the cost of the final product’s electronics portion. Coupled with the incredible demand for voice codecs for European GSM, the needs for specialization drove differentiation in the TI offering. The ‘C5x spawned two additional part families: the TMS320C2xx and TMS320C54x. The ‘C2xx is focused on motor control and similar applications, while the ‘C54x is targeted on GSM and other digital cellular standards-based applications. Both of these families continue to evolve with higher performance parts, new peripherals, and larger memory sizes, announced with some regularity. The variations and alternatives within each family are extensive, and the programmer’s control over such features as memory maps creates hundreds of in- system variants. The ‘C54x family now includes twenty-six different standard part types, including the TMS320UC5402 fixed-point DSP, which is aimed at low-power, high-performance applications. The ‘C5402 features low power consumption and the flexibility to support different system voltage configurations commonly found in battery powered applications. The wide range of I/O voltage enables it to operate with a single 1.8-V power supply or with dual power supplies for mixed voltage systems. This feature eliminates the need for external level-shifting and reduces power consumption in systems below 3V. The part includes three separate 16-bit data memory buses and one program memory bus to optimize memory access during multifunction instruction execution. On-chip peripherals include: software-programmable wait-state generator; programmable bank switching; on-chip phase-locked loop (PLL) clock generator with internal oscillator or external clock source; two multichannel buffered serial ports (McBSPs); an enhanced 8-bit parallel host- port interface (HPI8); two 16-bit timers; a 6-channel direct memory access (DMA) controller; power consumption control with IDLE1, IDLE2, and IDLE3 instructions with power-down modes; CLKOUT off control to disable CLKOUT; on-chip scan-based emulation logic; and IEEE Std 1149.1 (JTAG) boundary scan logic. The DSP runs at nearly one hundred times the speed of the original TI TMS32010, achieving 80 MIPS, with each instruction consuming just one clock period. The ultimate application in portable battery- powered products dictates thin, space-saving packaging. The ‘C5402 is available in a 144-pin TQFP and 144-pin BGA.

The other offshoot in the ‘C25 lineage is the ‘C24x and ‘C240x part families, including a series of DSP cores employing the same instruction set. The ‘C24x family includes eleven part types. The TMS320LF240x and TMS320LC240x devices are based on the TMS320C2xx generation of fixed-point DSPs. The ‘240x devices offer the enhanced TMS320 architectural design of the ‘C2xx core CPU. Several advanced peripherals, optimized for digital motor and motion control applications, have been integrated to provide a single-chip DSP controller. While code-compatible with ‘C24x DSP controller devices, the ‘C240x offers increased processing performance (30 MIPS) and a higher level of peripheral integration.

The ‘C240x family offers a selection of memory sizes and different peripherals tailored to meet specific price/performance points defined by mass production applications. Flash-based devices of up to 32k words provide reprogrammable solutions useful for applications requiring field programmability upgrades. In addition, Flash memory serves the needs for development and initial prototyping of applications that migrate to ROM-based devices in production. The Flash devices and corresponding ROM devices are pin-to-pin compatible.

The total signal chain

TI was not the only DSP company to follow the path towards applications specificity. Analog Devices, Inc. (ADI), once a secondary player in the DSP business, began to shift its focus on the digital communications market beginning in 1989 — almost to the exclusion of all other applications, during the initial years of the company’s microcomputer DSPs. Like TI’s migration from its earliest DSPs, ADI took an application focus towards its developments. But ADI made two key strategic decisions: future products would retain upward compatibility with the original microprocessor DSP (the ADSP2100), and the company would look at the entire signal chain, not just the digital portion. Unlike other DSPs offered in the late 1980s, the ADI 2100 was a true microprocessor designed specifically for DSP applications. There were no peripherals or memory onboard. The decision to consider the entire signal chain gave rise to ADI’s mixed-signal processors for digital cellular and other voice applications. ADI’s most recent fixed-point products have retained upward source compatibility to some degree with the original ADSP2100, but use superscalar implementation to improve performance. To benefit from the improvements present in the TigerSHARC, legacy programs must be rewritten to exploit the added capabilities. The TigerSHARC is a static superscalar architecture, and incorporates many aspects of conventional superscalar processors, including a load/store architecture, branch prediction, and a large interlocked register file. The term “static” is applied because instruction-level parallelism is determined prior to run-time and encoded in the program. All the registers are interlocked, supporting a simple programming model that is independent of implementation latencies and is fully interruptible. Branch prediction is supported by a 128-bit entry branch target buffer (BTB) that reduces branch latency. Program code is stored in quad-word memory with no wasted space. The product can execute eight 16-bit multiply and accumulates (MACs) per cycle with 40-bit accumulation, two 32-bit MACs per cycle with 80-bit accumulation, or two 16-bit complex MACs per cycle.

Reflecting ADI’s experience in serving the communication segment, TigerSHARC includes a single-cycle add, compare, and select (ACS) sequence in the Viterbi algorithm, an add-subtract instruction and bit reversal in hardware for FFTs, and a 64-bit generalized bit-manipulation unit.

The TigerSHARC processes 8-, 16-, and 32-bit data as native types. This allows the processor to scale the number of operations that can be completed in a cycle, based on the length of the data type being processed. Each of the two computation blocks (CBX and CBY) contains a multiplier, an arithmetic and logic unit (ALU), and a 64-bit shifter. With these resources, a single cycle supports execution of eight 40-bit MACs on 16-bit data, two 40-bit MACs on 16-bit complex data, or two 80-bit MACs on 32-bit data. With 8-bit data types, the architecture can scale performance to issue sixteen operations in one cycle, executing 8 billion operations per second. The TigerSHARC features a short-vector memory architecture organized in three 128-bit wide banks. Quad (128-bit), long (64-bit), and normal (32-bit) word accesses move data from the memory banks to the register files for operations. Four 32-bit instruction words can be fetched, and 256-bits of data can be loaded to the register files or stored into memory in a single cycle. The memory architecture can store 8-, 16-, and 32-bit data in contiguous, packed memory. Internal and external memories are organized in a unified memory map, and the partition between program memory and data memory is user-determined.

The computational resources are controlled by a sequencer that can issue up to four 32-bit instructions in parallel. One or two of these instructions can control more than one computational unit, reducing code size and power consumption. Programmers can control how individual instructions to each of the computation units are issued.

Joining forces

Developing DSP architectures is a time-consuming, expensive process that few companies can afford. Even though they are large enough to fund separate development efforts, Motorola and Lucent have joined forces to develop new DSP architectures. The joint venture group called StarCore has already specified the first core: the StarCore SC140. This DSP core includes four MACs and executes more than one billion MACs per second. The StarCore alliance pools some of the industry’s most experienced and expert DSP engineers, including Jim Boddie — one of the most experienced DSP architects in the industry. Boddie has been a key figure in nearly every Lucent DSP developed, including those developed twenty years ago for internal use.

The SC140 is a 16-bit DSP core, initially available at a clock speed of 300 MHz and an operating voltage range of 0.9 to 1.5V. Faster clock speeds and lower-voltage versions are planned for the future.

The core includes twelve data execution units, which consists of four MACs, four general-purpose arithmetic ALUs, and four bit field units (BFUs). Each MAC unit can execute a 16 x 16-bit fractional or integer multiplication, and can then add the result to a 40-bit accumulator in a single clock cycle. The ALUs perform general calculations such as adds, subtracts, compares, and maximum value operations. The BFUs perform bit field functions. Each BFU incorporates a 40-bit barrel shifter to speed such operations as multibit shifts, bit rotations, and inserts useful in communication processing. The integration of four such barrel shifters on a single DSP core is unique and contributes to the SC140’s execution of communication algorithms.

The core’s program control unit includes a program sequencer, which fetches instructions and performs loop and branch control. The SC140 has a five-stage pipeline consisting of program pre-fetch, program fetch, dispatch/decode, address generation, and execute. This is a relatively short pipeline by current DSP standards. The shorter pipeline simplifies assem-bly language programming and improves hardware branch and interrupt handling by reducing the number of conditions that programmers and hardware must consider. Up to eight 16-bit data words may be fetched at once using two 64-bit data buses, for a total bandwidth of 4.8 Gbps. The program data bus is 128 bits wide, allowing up to two prefixes and six instructions to be fetched per cycle.

While TI and ADI have demonstrated their ability to engineer a family of parts and architecture enhancements, the StarCore promise is still too new to evaluate completely. Both partners are capable of creating an ongoing series of cores and parts to meet future applications needs, and both have stated that they will develop products based on StarCore technologies. With the first products planned for the year 2000, StarCore partners promise to continue developments to meet their own market needs.

The exotic architectures

Harvard architectures and superscalar implementation techniques are two of the better understood technologies used to create advanced DSPs. But as engineers have demanded greater levels of performance, DSP designers have resorted to a wide variety of more exotic architectures and implementation techniques.

TI was the first mainstream DSP company to embrace the the very long instruction word (VLIW) architecture as a mechanism to improve performance. VLIW works by using an instruction word large enough to hold several basic instructions. TI adopted an 8-way VLIW architecture in which up to eight basic operations may be specified by programmers every single cycle. Tremendous flexibility and high performance are the advantages of this type of architecture. The drawback is that it is extremely difficult for all but the most expert programmers to develop memory- and performance-optimized code. Instead, special optimizing software determines an acceptable schedule for the individual instructions and rearranges them to achieve the best performance (with a few restrictions). The TI VLIW DSPs (TMS320C6x and TMS320C67x floating-point parts) may be the first DSPs to force mere mortal DSP programmers to use C during development.

Where TI’s ‘C6x family relies on VLIW as the basic instruction format, Infineon (formerly Siemens Semiconductor) observed that most DSP programs consist of a small amount of DSP-specific time critical code coupled with much larger control and general purpose code. To save code size and simplify programming, Infineon’s Carmel DSP combines a normal CISC with the added ability to trap to a VLIW program memory space. High-performance digital signal processing code or specialized instructions are contained in a 1k- word configurable long instruction word (CLIW) memory space, while compact code is stored in ordinary program memory. The Carmel core uses the CLIW reference in ordinary memory to select the CLIW code fragment to be executed without any branching overhead.

Whether the core uses VLIW or CLIW, the use of software branch interlocks places an extra burden on programmers and development tools. The higher performance comes at the expense of needing to consider pipeline effects on individual instructions. Ruthless programmers can produce amazingly small and efficient code. Mere mortals may have their hands full in keeping track of the warning messages produced by the development tools.

Choosing a DSP

Many factors go into choosing a specific DSP. Company reputation, support, price, availability, and quality are just a few of the factors considered when choosing a complex product like a DSP, and there are more to choose from than those mentioned in this article. Please see the product table and vendor list for a complete listing of DSPs available on the open market (the product table can be viewed by visiting www.csdmag.com). Two of the DSP industry’s engineering godfathers offer hard-won advice. Robert Owen, a long-time consultant in Saratoga, CA (and a member of the team that developed Intel’s original 2920 DSP two decades ago) observed that it is necessary to “make sure that the part is a real thing. DSPs are hard to develop, and you can get lost in the architecture subtleties. But the question is: Which ones will become living breathing products?” Auburn, CA-based consultant, Richard Blasco (designer of the first commercial DSP, AMI’s 2811 SPP) offers this sage piece of advice: “Choose a DSP that has enough performance, but not too much. Remember that digital signal processing can be done on any processor — it’s all a matter of bandwidth.” With the hundreds of digital signal processing-based processors available, nearly any need can be met with a near-perfect technical match. Now that more than half of the semiconductor companies offer some type of DSP technology, finding a company that fits your business and technology plans is only a matter of searching.

Henry Davis is president of Henry Davis Consulting, a new products consultancy based in Soquel, CA. Davis is a contributing editor for Communication Systems Design. He holds a BSin computer science and business administration from Columbia Pacific University, and has done graduate work at the New Mexico Institute of Mining and Technology. He can be reached at hdavis@ix.netcom.com .

Return to the Table of Contents





Virtualab

  • Teardown video: Nokia's N95 smartphone
  • Testing WiMAX, testing
  • Freescale reboots basestation DSPs, leapfrogs TI
  • Sneak preview: The future of wireless
  • MORE
    Prototype fuel cell for handsets eyes fivefold run-time boost
    As part of a research collaboration on miniaturized energy sources, the French Atomic Energy Agency (CEA) and STMicroelectronics NV (Geneva) have prototyped a hydrogen fuel cell for mobile phones that aims to reduce dependency on the use of electrical power supplies to recharge batteries. EE Times' Anne-Francoise Pele Takes a closer look.Click here to learn more.

    Tech Article Library
    Check out CommsDesign's Design corner to find a detail technical articles on a host of communication design issues. To access the design corner, click here.

    Phyworks demos 10G copper interconnects
    Communications chip specialist Phyworks (Bristol, England) has demonstrated 10Gbits/s rack-to-rack copper interconnects of up to 30 metres using technology it originally developed for the optical module market. EE Times Europe's John Walko gets the story. Click here for details.

    Puzzled by a network processing design issue?

    Join former NPF CEO Colin Mick in discussing net processing design issues by clicking here!


    EE Times TechCareers
    Search Jobs

    Enter Keyword(s):


    Function:


    State:
      

    Post Your Resume
    -----------------
    Employers Area
    Most Recent Posts More career-related news, resources and job postings for technology professionals




    Home  |  Register  |  About  |  Feedback  |  Contact   |  Site Map