|
|||||||||||||||||||
|
|
Failure analysis techniques for semiconductors and other devices Feb 1, 2001 12:00 PM, V. Lakshminarayanan [For a copy of this article in PDF format, which displays figures and equations, click here. Requires Adobe Acrobat Reader, free download.] Knowing why devices fail is a must when designing next-generation products. Today's electronic systems are becoming more complex and compact. Concepts of quality and reliability are increasingly applied to products and yet system component failures are still common. Failure of a system causes disruption in the service and costly down-time for repair, which affects the economy of operation. Failure analysis (FA) can give valuable insight into the causes of failure and provide inputs for product improvement. It is also a tool for system reliability evaluation. Several techniques are used to carry out the FA of electronic components, some of which are described in this article. Examples based on FA case studies are also presented to illustrate the various techniques used. Failure analysis identifies the causes of failure by analyzing stresses and other mechanisms causing failure. It also addresses the performance degradation of components and develops corrective measures. A failure mechanism usually leads to an identifiable change in a component. Semiconductor device failures can be explained using a physics-of-failure approach. The bath-tub curve (see Figure 1) is a commonly used model for describing the failure of components - electronic or mechanical. The infant mortality failures are due to defects during manufacture, improper design or implementation of the component, or freak failures that have not failed during screening. The normal operation region of the curve, which is the useful life phase of the component, is long when comparing electronic components to mechanical parts. Most product designs get revised before this phase of the electronic components is passed. Usually, failures during this phase are caused by stresses such as high temperature (thermal overstress), high voltage/current (electrical overstress), humidity, vibration, mechanical or thermal shock. Wear-out failures occur after the useful life phase of the component has passed. Examples of wear-out failure are: corrosion, electrical leakage, insulation breakdown, migration of metallic ions in the direction of current flow, cracking of the encapsulating material due to deterioration of the material, and cracks in the bond wires due to repeated stresses. Failure analysis as a tool FA is used to evaluate the reliability of a product under actual operation. A failed component can provide important information to enhance the reliability of a device or product. Depending on the type of component failure, the failure mode, mechanism, and factors such as stresses can be identified, inducing the failure and initiating appropriate corrective measures. FA provides feedback to the product designers for improving design or even correcting minor design faults that might have been overlooked in the initial design. The objective of FA is to identify the cause of failure and initiate corrective action. Feedback from FA can implement improvements in the design and/or construction of the component, or improvement in the design of the product where the component is used. Such action usually modifies the design by incorporating additional components or by modifying the application circuit. Several techniques are used in FA work to find out the cause of failure. The method used in the FA investigation depends on the severity and type of problem. The techniques used in the case of electronic devices range from simple electrical measurements to examination of decapsulated samples under a microscope. Maximum information should be gathered from the failed samples using non-destructive methods before the devices are opened. Investigative procedures To initiate a proper corrective action, the failure mechanism of the component must be correlated to the field observations of the operating conditions that caused the failure. The first step is to collect data on the number of failures observed and the sample size. This information, when collected over a large number of locations where the system is deployed, will help the analyst decide about the statistical significance of the failure. Information should be collected on the observed problem and the conditions under which the failure occurred. If the failure occurs during a particular test condition or operation, the possible stresses likely to be encountered by the component can give a clue to the possible cause of failure. Sometimes, equipment that functions well in one area will develop problems at other geographical locations because of environmental or EMI/RFI conditions. Furthermore, situations could exist in which the problem may not be with the failed component at all, but with the wrong application of the component or lack of proper circuit protection. These issues may cause the failure of the component by exposing it to higher-than-normal stress levels that the device is rated to withstand. Based on the data collected, a procedure is chosen. As a preliminary step, a rough flow-chart of the FA procedure to be used is drawn up based on the type of problem. Information is collected about the design i.e., circuit diagram of the card where the component is used, data sheet of the component that has failed, make and batch number details, number of failures, and the conditions under which failures occur. Based on this preliminary study, a hypothesis is drawn about the cause and type of failure. In the next step, fresh samples of the devices from the same batch are sought for further testing. This helps identify component-level faults, inherent device problems, application problems, batch-related problems and any other related cause of failure. The components, unless they are destroyed by the failure, are tested electrically in a component tester that checks the functionality of the device, or in a curve tracer, to study the V-I characteristics. A similar test is carried out on a good sample to verify differences in the electrical characteristics between the good and failed devices. Once the failure is verified, the mode of failure is identified by any of the FA techniques described further on. And, depending on circumstances, the analyst may decide to analyze the component's systems as well. Components do not generally fail on their own. The cause of failure is rarely identified as an infant mortality case or a batch problem. Failures are usually due to external forces - electrical overstress (EOS), thermal overstress, mishandling (e.g. electrostatic discharge (ESD) damage, or problems created by components in the vicinity or associated circuits. Sometimes other components in the circuit can cause the failure of a device (e.g. transformer leakage inductance, caused by a defective transformer). High-temperature heat sinks mounted close to electrolytic capacitors can also cause failure of such capacitors. Failure can be caused by defective PCB construction or operating environment, or similar factors as well. After an FA investigation is complete, a report should be made detailing the analysis. The report should include details about the problem reported, analysis carried out, test results, readings of parameters (if taken), techniques used for the investigations, batch number and make of the component involved, exact cause of failure identified, and corrective action recommended to address the failure. Over the years, several techniques have been developed for analyzing various components. Some of the commonly used techniques for the analysis of a semiconductor component are listed in Table 1. The decision in each case on the method to be used for failure analysis depends on the extent and type of failure observed. FA analysis techniques When a device failure is identified, it is necessary to proceed systematically, First, the faulty card or module should be examined thoroughly to see if there are any visible and obvious manifestations of failure or damage, such as charring of device. Try to collect as much data as possible about the extent of the failure. Analyze frequency, conditions under which the failure occurs, whether it occurs during any particular load or test condition, number of failures and the corresponding sample size, and whether any correlation can be drawn between related failures of the cards or components. Go about the analysis in a step-by-step fashion using non-destructive techniques initially and gradually progressing to destructive methods of analysis. The objective of this approach is to avoid destroying evidence and to ensure that the chemical actions of destructive methods do not lead to a faulty analysis. For example, the use of acids for etching a plastic package could lead to corrosion of metallic parts in the device. Conditions such as metalization in the presence of corrosive chemicals and moisture may make it difficult to determine whether the corrosion was existed before the device was opened or occurred due to chemical action after the device was opened. Environmental conditions such as humidity, temperature, dust, salinity, and presence of chemical contaminants in the area of operation should be noted. Any drift in the component operation parameters should be examined, as well as any associated fault in some other component that may have triggered the failure. Functional and parametric electrical tests In the case of semiconductor and passive components, the failed sample is electrically tested in an automatic tester, or by using laboratory instruments (oscilloscope, test pattern generator, curve tracer), to verify the failure and observe the critical parameters. Device malfunction and any deviation from standard device characteristics can be observed by this method. A curve tracer is used to study the input/output characteristics of the device. Faults such as open/short circuit and degradation of device characteristics can be detected by this method as well. X-ray examination may be done at this stage to find any internal defects in the component. Microscopy techniques - Low-magnification microscopy - The component is initially examined under a low-power optical microscope having a magnification from 10 to 100 X. This is used to observe any external damage, verify the logo on the package (to verify die lots and detect spurious devices), and handle damage and hair-line fractures in the component's leads or pins. These observations should be recorded as part of the analysis. Photograph 1 shows a typical fault, which can be observed by this method. Notice the hair-line crack in the plastic encapsulation of the IC due to internally generated thermal overstress. - High-magnification optical microscopy - After a device is decapsulated, the inner structure can be examined by a higher magnification (up to 2000X) optical microscope. Such a device is used to reveal damage to the internal structure. Before examination, the sample should be ultrasonically cleaned to remove fine particles of dust that may adhere to the device structure after the decapsulation process. Damage due to EOS, corrosion of metalization patterns, damage to bond-wires, oxide layer damage and spiking faults can be identified by this method. Photographs 2, 3 and 4 show examples of such an observation. - Infra-red (IR) microscopy - This method relies on the transparency of silicon-to-IR wavelengths. Using this technique, certain types of failure, such as ball bond defects, corrosion, intermetallic diffusion, overstress effects, and spiking across layers, can be identified. To prepare a sample for IR microscopy, the back side of the package is polished to remove the encapsulation. On reaching the die surface, the polishing is stopped. Views of the device's lower layers, which cannot be seen by decapsulation of the upper package, can be seen by this technique. Because the upper layer of the device is not damaged by chemical etching, electrical measurements can be made on the sample if required. ESD and corrosion damage in the inner layers of devices can be identified using the IR technique. Other techniques used to study the internal structure of devices are scanning acoustic microscope, scanning electron microscope, and X-ray and thermal imaging techniques. These techniques are used when optical techniques do not help identify the problem. Finally, micro-probing may be used to make any measurements and identify the failed nodes in the device. Photograph 5 shows a case of EOS damage using an IR microscope. Decapsulation techniques Decapsulation is an FA technique used for internal examination of the device. With rapid strides being made in the area of device technology, new types of packaging materials for encapsulation are being developed and used in the manufacture of semiconductor devices. Depending on the package material used, different techniques are used for opening devices. Some of the commonly used packaging materials are plastic, ceramic and metal-can packages. Plastic encapsulation is etched out by chemical agents such as hot fuming nitric acid or sulphuric acid delivered through a jet delivery system. There are a number of different etching agents available, so using the proper one is important superscript 4. Metal and ceramic packaging also have specific decapsulation methods. Following are typical methods for device decapsulation. - Ceramic packages are opened by removing the encapsulation mechanically. These tools depend on the fracturing of the brittle ceramic packages for opening by application of pressure. - Metal-can packages such as transistors are opened using rotary cutters fitted with sharp blades. - Metal-lidded packages such as those used in some LSI devices are opened mechanically by means of lifting a corner of the seal after sawing. Thermally opening the solder seal is also possible provided care is taken to avoid thermal overstress damage to the die within. In all cases, care should be taken to ensure that the tool does not damage the interconnections or the die during the decapsulation process. - Glass packages are delicate and require careful handling. They are opened by mechanically lapping the package along the axis until the active device region is reached. The sidebar contains an overview of common failures in semiconductor devices and passive components. Looking ahead Miniaturization of electronic components is progressing at a rapid pace and device geometries have shrunk over the years with a corresponding increase in the circuit complexity. New materials are being developed for packaging devices at a lower cost. This is also to protect against thermal stresses, and moisture. Higher complexity and finer device geometries will present problems for failure analysis work and require new types of instrumentation techniques to probe into the die - level world. A failure analyst should have a thorough knowledge of component and system design techniques to effectively tackle failure analysis tasks. The rapid strides in the field of integrated circuit technology and microelectronic packaging techniques will throw up a lot of challenges for failure analysts in the coming years. References [1] Edgar A. Doyle: How Parts Fail, IEEE Spectrum, October 1981. [2] Amerasekera E. A., et al: Failure mechanisms in semiconductor devices, John Wiley & Sons, 1987. [3] Avram Bar-Cohen, et al: Advances in Thermal Modeling of Electronic Components and Systems, Vol. 3, IEEE Press 1993. [4] Emiliano Pollino: Microelectronic Reliability, Vol. II, Artech House, 1989. [5] Giulio Di Giacomo: Reliability of Electronic Packages and Semiconductor devices, McGraw Hill, 1997. [6] Lakshminarayanan, V.: Minimizing Failures in Electronic Systems by Design, EDN, August 3, 2000. Semiconductor devices - Penetration of moisture, flux contaminants during soldering, washing of boards, storage under humid conditions. Due to seal integrity problems. - Mechanical stress cracks due to differential thermal expansion of plastic encapsulant, metal leads, die. - Chip-to-substrate attachment failure leading to voids and thermal stress problems. - Bond wire snapping due to EOS - Deformation of bond wires due to improper bonding. - Cracks at the bonding pad-bond wire junction - Metalization damage due to EOS, ESD, corrosion. - Electromigration of metal along the direction of current flow. - Hillock formation by metal ions. - Degradation of metalization at high temperature. - Oxide layer faults due to impurities, ESD damage, pin-hole due to etching processes. - Defects in the bulk semiconductor material, such as crystal defects. - Design and fabrication faults, misalignment of layers, geometric defects. - Leakage at p-n junction. - Deviation from the normal characteristics of the device. - Changes in threshold voltage/current characteristics. Resistors - Open circuit caused by thermal overstress due to EOS (high current flow leading to increased I2R loss). - Cracks at the lead-body interface leading to open-circuit. - Degradation in value due to application of high levels of stress, exposure to high humidity conditions, high temperature operating environment. Capacitors - Rupture of oxide film in electrolytic capacitors due to application of high electric field. - Leakage of electrolyte in electrolytic capacitors due to high temperature, faulty seal. - Moisture ingress due to voids between the leads and body leading to a short circuit. - Leakage current. - Degradation of dielectric material due to exposure to humidity, high temperature, aging. - A unique property of Aluminum electrolytic capacitors is that they get set to the voltage at which they are operated even though the rated voltage may be higher. Hence excessive derating of applied voltage should not be done in the case of such capacitors. - Shift in parameters. - Lowering of insulation resistance. - Open circuit failure. - Short circuit failure. - Corrosion of the electrodes due to chemical action caused by contaminants and moisture. - Polarity reversal in electrolytic capacitors can cause damage. - Disconnection of lead wires from the terminations. - Drying up of electrolyte due to operation at high temperature. - Dielectric breakdown due to application of high voltage beyond the rating. Coils - Open circuit of coil wire due to thermal overstress caused by shorting of adjacent turns where insulation has been damaged during winding process or due to a manufacturing process fault. - Nicks and kinks in the wire can cause the above failure to occur. Transformers - Open circuit fault in primary and secondary windings due to excessive thermal stress caused by EOS, shorting of windings as in the case of coils. - High levels of parasitics such as leakage inductance, inter-winding capacitance due to faulty design and manufacturing technique. - Short circuit between primary and secondary due to poor isolation, low dielectric withstanding voltage. - High levels of copper and eddy current losses, which leads to high heat dissipation in the transformer and affects adjacent components. Basically caused by poor design. - Corona discharge can sometimes occur between adjacent turns or windings. To prevent this, impregnation of the transformer should be proper. Relays - Arcing induced damage of contacts. - Corrosion of contacts due to ingress of moisture, flux, cleaning agents due to improper sealing. - Melting of contacts due to Electrical Overstress (EOS). - Coil damage due to EOS. - Damage to plastic body due to exposure to high temperature e.g., during soldering or internally generated heat due to EOS. Printed circuit boards - Discoloration due to exposure to high temperature during soldering, heat dissipation of components on the board. - Delamination due to exposure to high temperature. - Warping due to exposure to high temperature, faulty board design (insufficient thickness of the laminate, faulty layout and mounting of components on the board). The commonly observed failure mechanisms, their causes, analysis techniques to detect this fault, and test screens used to precipitate the failure mechanisms are listed in Table 2. |
|
|||||||
| Back to Top |