Normalized Mantissa Binära Alternativ

Metod för och apparat för normalisering av ett flytande punkt binärt tal US 5513362 AA efterbehandling utförs på en mantissa M och en exponent E av ett flytande punkt binärt tal som ett resultat av subtraktion för att därigenom erhålla en mantissa m och en Exponent e av resultatet av efterbehandlingen Därför matas en utmatning E-1 av en avtagare och en utsignal av avbrytande av mantissa LSA av en framåtriktad 1 detekteringskrets in i en minimivärdesvalkrets. För att ställa in ett skiftebelopp SH till E-1 och en storleksrelaterad bedömningssignal CR till 1 när E-1 är mindre än LSA, det vill säga när en denormaliseringsbehandling krävs när E-1 inte är mindre än LSA, när En normaliserad behandling krävs, SH är inställd på LSA och CR är inställd på 0. En vänster växlare är anpassad att, som resultatets mantissa m, erhålla ett värde erhållet genom att exekvera en vänsterskiftbearbetning som har en skiftmängd SH på mantisen MA väljer c Ircuit är anpassad att leverera som exponent e av resultatet 0 när CR är lika med 1 och en utgång E-LSA av en subtraheringskrets när CR är lika med 0 Detta möjliggör denormalisering av bearbetning av ett flytande punkts binärt tal till Exekveras med en hög hastighet som motsvarar den vid vilken en normaliserar bearbetning utförs. 3.1 En operativ bearbetningsanordning för att utföra en skiftbehandling på en mantissa, som har ett flertal bitpositioner och radixpunkt, av ett flytande punkts binärt tal och för att justera en exponent av nämnda flytande punkts binära nummer, varvid nämnda apparat innefattar addering av 1 detekteringsorgan För att detektera bitpositionen för en framåtriktad 1 i nämnda mantissa och för att som en mängd avbrytande av nämnda mantissa tillhandahålla en skillnad mellan nämnda bitposition och bitpositionen för en bit-en-bitposition som är mer signifikant än radixpunkten För att tillhandahålla ett värde erhållet genom att subtrahera 1 från nämnda exponentparande medel för att jämföra i storleksordning med varandra två ingångsdata, dvs en utmatning av nämnda avtagningsorgan och en mängd avbrytning som matas från det framåtriktade 1 detekteringsorganet, för att därigenom mata Ett resultat av magnitude-relation dömning, ingångsdata beroende på vilket som är mindre, och också för att tillhandahålla en magnitud-relation dömningssignal som representerar w Vars inmatningsdata är mindre utav nämnda två inmatningsdata. Utdragningsorgan för att tillföra ett värde erhållet genom att subtrahera, från exponenten, en mängd avbrytning som matas från nämnda framåtriktade 1 detekteringsorgan. selektionsorgan för att leverera som en exponent av ett resultat av En operativ bearbetning 0 när en magnitud-relation dömningssignal från nämnda jämförande organ representerar att utmatningen av nämnda avtagningsorgan är mindre än nämnda två ingångsdata, och en utsignal från nämnda subtraheringsorgan när nämnda storleksrelaterade bedömningsignal representerar Annars och skiftande organ för att, som en mantissa av resultatet av den operativa bearbetningen, tillhandahålla ett värde erhållet genom att exekvera på nämnda mantissa av nämnda flytande punkt binära nummer, en vänster skiftbehandling där skiftbeloppet är lika med ett resultat av magnitud - Relativ bedömning, som har ett flertal bitar, tillförda från jämförningsorganet. Därvid har jämförningsorganet en minimivärdesvalkrets för att sprida magnitukten De förhållandet mellan de två ingångsdata för varje siffra därav från en mest signifikant siffra till en minst signifikant siffra, varigenom resultatet av storleksförhållande dömas, med början med den mest signifikanta siffran, och skiftorganet innefattar vänster 2 kk 0 1, 2 n-1 bitskiftare som respektive motsvarar lägre n-bitar med ett resultat av storleksrelaterad bedömning som tillförs från nämnda minvärdesvalkrets och som är anslutna i kaskad till varandra. 2 En operativ behandlingsanordning för att utföra en Skiftbehandling på en mantissa, som har ett flertal bitpositioner och en radixpunkt, av ett flytande punkts binärt tal och för att justera en exponent av nämnda flytande punkts binära nummer, varvid nämnda apparat innefattar addering av 1 detekteringsorgan för detektering av bitpositionen hos en Framme 1 i nämnda mantissa och för att, som en mängd avbrytande av nämnda mantissa, tillhandahålla en skillnad mellan nämnda bitposition och bitpositionen för en bit en bit mer sig Väsentligt än radixpunktsskärningsorganet för att tillföra ett värde erhållet genom att subtrahera 1 från nämnda exponentparande och valmedel för att jämföra i storleksordning med varandra två inmatningsdata, dvs en mängd avbrytning som matas från det framåtriktade 1 detekteringsorganet och exponenten För att därvid, som en följd av storleksförhållande-bedömning, tillhandahålla nämnda mängd av avbrytande när nämnda avbrytningsgrad är mindre än exponenten och en utmatning av nämnda avtagningsorgan när nämnda avbrytningsgrad inte är mindre än exponenten, och även Att tillhandahålla en storleksrelaterad bedömningssignal som representerar vilken ingångsdata som är mindre ut ur nämnda två ingångsdata. Utdragningsorgan för att tillföra ett värde erhållet genom att subtrahera, från exponenten, en mängd avbrytning som matas från nämnda framåtriktande 1 detekteringsorgan. Att som en exponent av ett resultat av en operationell behandling tillhandahålla en utsignal från nämnda subtraheringsorgan när en storleksrelaterad bedömningssignal f Varvid nämnda nämnda jämförande och valda medel representerar att utgående från nämnda två inmatningsdata är nämnda avbrytningsbelopp som matas från nämnda framåtriktande 1 detekteringsorgan mindre och 0 när nämnda storleksrelaterade bedömningsignal representerar annat och. förskjutningsorgan för att tillföra, som en Mantissa av nämnda resultat av den operativa behandlingen, ett värde erhållet genom att på nämnda mantissa av det flytande punktens binära tal utföra en vänster skiftbehandling där skiftmängden är lika med ett resultat av storleksrelaterad bedömning, med ett flertal bitar, Levererad från nämnda jämförande och utvalda organ. Därvid har jämförnings - och väljorganet en jämförande och väljarkrets för att sprida storleksförhållandet för de två ingående data för varje siffra därav från en mest signifikant siffra till en minst signifikant siffra för att därigenom ge resultatet Av storleksrelaterad bedömning, som börjar med den mest signifikanta siffran, och skiftorganet innefattar kvarvarande 2 kk 0, 1, 2 n-1 bitskiftare som respekterar Motsvarar ctively lägre n-bitar av ett resultat av storleksförhållande domän som tillförs från nämnda jämförande och väljarkrets och vilka är förbundna i kaskad med varandra.3 En operativ behandlingsanordning för att utföra en skiftbehandling på en mantissa med ett flertal Bitpositioner och en radixpunkt, av ett flytande punkts binärt tal och för att justera en exponent av nämnda flytande punkt binära nummer, varvid nämnda apparat innefattar. advancering 1 detekteringsorgan för detektering av bitpositionen hos en framåtriktad 1 i sagda mantissa och för tillförsel, såsom En mängd avbrytande av nämnda mantissa, en skillnad mellan nämnda bitposition och bitpositionen för en bit-en-bitposition som är mer signifikant än radixpunkten. Utdragningsorganet för att som ett resultat av subtraktion tillföra ett värde erhållet genom att subtrahera från exponenten , En mängd avbrytning som matas fram framför nämnda framåtriktande 1 detekteringsorgan, och även för att mata en storleksrelaterad bedömningssignal som representerar huruvida eller n Av nämnda exponent är lika med eller mindre än nämnda mängd av annullering. Första valorganet för att som en exponent av ett resultat av en operationsbehandling 0 tillhandahålla när en magnitud-relation dömningssignal från nämnda subtraheringsorgan representerar att exponenten inte är större Än nämnda avbrytningsgrad och ett resultat av subtraktion som matas från nämnda subtraheringsorgan när nämnda storleksförhållande dömningssignal representerar annars. Annars väljer man organ för att tillföra exponenten när en storleksrelaterad dömningssignal från nämnda subtraheringsorgan representerar att exponenten inte är Större än nämnda avbrytningsbelopp och nämnda avbrytningsbelopp som matas fram framför det framåtriktade 1 detekteringsorganet när nämnda storleksrelaterade bedömningsignal representerar annat och. förskjutningsbehandlingsorgan för att, som en mantissa av resultatet av operationsbehandlingen, erhåller ett erhållet värde Genom att utföra, på nämnda mantissa av nämnda flytande punkt binära nummer, en vänsterskiftbehandlingsprocess i whi Ch-skiftmängden är lika med ett värde erhållet genom att subtrahera 1 från en utgång från nämnda andra urvalsorgan när en magnitud-relation dömningssignal från nämnda subtraheringsorgan representerar att exponenten inte är större än nämnda avbrytningsbelopp och i vilken skiftmängden Är lika med nämnda utmatning i sig av nämnda andra urvalsorgan när nämnda storleksrelaterade dömningssignal representerar något annat, varav skiftbehandlingsorganet har en vänster växlare för att mata ett värde erhållet genom att på mantissa utföra en vänsterskiftbehandlingsbehandling i vilket skift Mängden är lika med en utsignal från det andra urvalsorganet och en höger 1-bitars shifter för att som en mantissa av resultatet av en operativ bearbetning tillhandahålla ett värde erhållet genom att exekvera en höger 1-bitskiftbehandling på en utgång av nämnda Vänster shifter när en magnitud-relation dömningssignal från subtraheringsorganet representerar att exponenten inte är större än mängden avbrytande och själva utmatningen av nämnda vänstra s Föreliggande uppfinning hänför sig till en metod och en apparat för en operativ bearbetning med användning av ett binärt antal av en flytande punktrepresentation enligt IEEE Institute of Electrical and Electronics Engineers Standard 754 Eller en som överensstämmer därmed. Med den senaste komplikationen av vetenskaplig teknisk beräkning eller grafisk procedur ökar efterfrågan på en snabb och exakt flytpunktsoperation En dator är anpassad att utföra en behandling med endast begränsade siffror i ett flytpunktsnummer Följaktligen uppstår ofta fel i ett resultat som erhålls genom en flytpunktsoperation. Operativ precision beror väsentligt på maskinens arrangemang av en dator, men genom att följa IEEE-standarden 754 kan fel som härrör från hårdvaruanordningen förhindras. I IEEE Std 754 , Ett format vars totala bitnummer är 32 inklusive ett 1-bitstecken S, en 8-bi T-exponent E och en 23-bitars fraktion F, specificeras för ett enkel-precisions flytande punkt binärt tal. Dessutom är ett format vars totala bitnummer 64 innefattar ett 1-bitstecken S, en 11-bitars exponent E och en 52- Bit fraktion F, specificeras för ett dubbel-precisions flytande punkt binärt tal. Generellt används ett flytpunktsnummer för vilket normalisering har utförts så att en virtuell icke-nollvärdesbit och radixpunkten är lokaliserad övre än den mest signifikanta Bit MSB i fraktionen F Emellertid ges en bias till en verklig exponent så att exponenten E är ett positivt värde För enkel precision används exempelvis som exponent E ett värde erhållet genom att tillsätta 127 som en bias till En verklig exponent Det vill säga ett reellt tal R1 uttryckt som ett normaliserat antal enkel precision uttrycks som följer. Där 1 F är en mantissa M. I IEEE Std 754 definieras att när ett operativt resultat är ett grannvärde Av 0 representeras detta som ett detormaliserat tal För enstaka precision tillverkas exponenten E 0 och det utförs en denormaliseringsbehandling för att flytta fraktionen F så att vikten av nollvärdesbiten övre med en bit än radixpunkten är 2-126. I detta fall, Ett reellt värde R2 uttryckt som ett detormaliserat tal uttrycks som följer. Därvid är mantitan M 0. Det finns ett fenomen som antalet siffror i ett effektivt tal minskas kraftigt när man lägger till två siffror av vilka absoluta värden är väsentligen Samma och av vilka tecken skiljer sig från varandra Ett sådant fenomen kallas avbrytande I subtraktion av flytpunkten är något annorlunda i värde från varandra, när en exponent av minuend är lika med en exponent av subtrahendet, subtraherar deras mantissor Exekveras utan en siffraposition som motiverar operation Till exempel när en mantissa av minuend är 1 100101 och en mantissa av subtrahend är 1 100010 är resultatet av subtraktion av mantissa S är lika med 0 000011 Således, när värdet på biten övre med en bit än radixpunkten är 0 i resultatet av en operation, sägs att avbrytande av mantissa har genererats. Antalet nollor som kontinuerligt förekommer från Positionen av biten övre med en bit än radixpunkten kallas en mängd avbrytning av mantissa. I detta exempel är mantits avbrytande 5. Ett flytande punktnummer som presenterar sådan avbrytande av mantissa normaliseras genom att exekvera, På en mantissa M är en vänsterskiftbearbetning som har en skiftmängd lika med storleken av avbrytande och genom att korrigera en exponent E så att mängden avstängning subtraheras från exponenten E I den följande beskrivningen krävs en vänster skiftmängd som krävs vid den tiden När avbrytande av mantissa har genererats kommer det att uttryckas som en mängd avbrytande LSA. När exponenten E inte är större än mängden avbrytande av mantissa LSA och mängden avbrytande LSA subtraheras fro M exponent E för normalisering, exponent efter korrigering blir inte större än 0 När ett operativt resultat inte kan uttryckas som ett normaliserat nummer krävs denormaliseringsbehandlingen ovan nämnda. Hårdvaran hos en konventionell dator är anpassad att utföra en bearbetning Av ett normaliserat tal Närmare bestämt när det bedöms att ett värde som erhållits genom att normalisera bearbetningen på ett operativt resultat i en hårdvara, inte kan uttryckas som ett normaliserat nummer, avbryts normaliseringen av bearbetningen, eftersom det anses att ett undantag har inträffat, Och en denormaliserad bearbetning tilldelas sedan mjukvaran. Följaktligen utförs denormaliseringsbehandlingen efter normaliseringen av bearbetningen har exekverats. Detta presenterar problemet att ett önskat operativt resultat inte kan erhållas vid hög hastighet. UPPFINNINGENS UPPFINNING. Det är ett föremål Enligt föreliggande uppfinning för att möjliggöra en denormalisering av bearbetning av ett flytande punkt binärt tal som skall utföras D vid en hög hastighet som motsvarar en hastighet vid vilken en normalisering av bearbetningen exekveras. För att uppnå ovannämnda syfte är föreliggande uppfinning anordnad så att före exponering av en normalisering av bearbetningen en exponent E och en mängd avbrytande av mantissa LSA jämförs i storleksordning med varandra och baseras på jämförelsesresultatet utförs antingen en normaliseringsbehandling eller en denormaliseringsbehandling. Enligt föreliggande uppfinning jämförs en exponent E och en mängd avbrytning av mantissa LSA i storleksordning med varje Andra och det bedöms huruvida resultatet av en operativ behandling är ett normaliserat tal eller ett detormaliserat antal. När resultatet av en operativ behandling är ett normaliserat tal E är större än LSA, väljes antalet avbrytande LSA som ett skiftvärde SH för En mantissa M och ett värde erhållet genom att subtrahera mängden avbrytande LSA från exponenten E väljs som en exponent e av resultatet normaliserar behandlingen på den andra Hand, när resultatet av en operativ behandling är ett denormaliserat tal E inte är större än LSA, väljes ett värde erhållet genom subtrahering 1 från exponenten E som förskjutningsbeloppet SH för mantitan M och O väljs som exponent e Av resultatet deormaliserar behandlingen Mer specifikt, även om resultatet av en operativ behandling är ett detormaliserat tal, kan behandlingen utföras med hög hastighet på samma sätt som för ett normaliserat nummer. RÄTT BESKRIVNING AV RITNINGARNA. FIG 1 är en Flödesdiagram som visar flödet av en bearbetning i en operativ bearbetningsmetod enligt en utföringsform av föreliggande uppfinning. FIG 2 är ett blockdiagram som visar arrangemanget av en första operativ bearbetningsanordning enligt en utföringsform av föreliggande uppfinning. FIG 3 är en Kretsschema som visar arrangemanget av insidan av en minimivärdesvalkrets som visas i fig 2. FIG 4 är ett blockschema som visar arrangemanget för en andra operationell bearbetning Anordning enligt en utföringsform av föreliggande uppfinning. FIG 5 är ett kretsschema som visar arrangemanget av insidan av en jämförelsekrets och en valkrets som visas i fig 4. FIG 6 är ett blockschema som visar arrangemanget av en tredje operationsbehandlingsanordning enligt En utföringsform av föreliggande uppfinning. FIG 7 är ett kretsschema som visar arrangemanget av insidan av en subtraherande krets som visas i fig 6 och FIG 8 är ett blockschema som visar arrangemanget av en fjärde operativ bearbetningsanordning enligt en utföringsform av Föreliggande uppfinning. Med hänvisning till bifogade ritningar kommer den följande beskrivningen att diskutera en operativ bearbetningsmetod enligt en utföringsform av föreliggande uppfinning och en operativ bearbetningsanordning som skall användas vid utövandet av ovan nämnda metod. FIG 1 visar en Sekvens av att utföra en efterbehandling på en mantissa M och en exponent E av ett inmatat flytande punkt binärt tal erhållet som ett resultat av en Drift till exempel subtraktion av normaliserade tal, så att man konverterar mantissa M och exponent E till en mantissa m och en exponent e av ett binärt antal utgående floating point. Följande beskrivning kommer att diskutera sekvensen steg för steg för enkel precision men Den operativa bearbetningsmetoden som visas i fig 1 kan också appliceras för dubbel precision. För att erhålla en mängd avbrytning av mantissa LSA detekteras första positionen för den framåtriktade 1 i en mantissa M. Mängden avbrytande LSA erhålles som en skillnad Mellan bitpositionen för den sålunda detekterade framåtriktningen 1 och positionen av biten övre med en bit än radixpunktsteget 101 Därefter jämförs en exponent E och mängden avbrytande LSA i storleksordning med varandra steg 102. När E är Inte större än LSA, exekveras en denormaliseringsbehandling så att resultatet av en operativ bearbetning uttrycks som ett detormaliserat antal. Följaktligen är det nödvändigt att minska exponent E så Att exponenten E är lika med 0 och att på mantitsen M utföra en vänsterskiftbearbetning som har en skiftmängd som motsvarar mängden av en sådan minskning. Biten övre med en bit än radixpunkten i ett normaliserat tal har vikt Av 2 -127 men vikten av en sådan bit i ett denormaliserat tal är 2 -126 som visas i ekvationen 2 Följaktligen krävs att 1 bit minskar från skiftmängden när en vänster skiftbearbetning utförs på mantissa M I detta sammanhang sätts skiftmängden SH för mantissen till E-1 steg 103 och en exponent e av resultatet av den operativa behandlingen är inställd på O steg 104. Å andra sidan, när E är större än LSA, Förskjutningsbeloppet SH för mantissen är inställt på LSA för att utföra ett normaliseringsbehandlingssteg 105 och en exponent e av resultatet av en operativ behandling är inställd på E-LSA-steget 106 Vid denna tidpunkt exponenten e-LSA Är positiv. På ett steg 107 exekveras en vänsterskiftbehandling på mantissa M acco Ringen till skiftmängden SH erhållen vid steget 103 eller 105, varigenom man erhåller en mantissa m av resultatet av den operativa behandlingen. Enligt den ovan beskrivna operativa bearbetningsmetoden kontrolleras flödet av behandlingen utifrån resultatet av jämförelsen I storleksordningen mellan exponenten E och mängden avbrytande av mantissa LSA Följaktligen kan även bearbetningen, även om resultatet av en operationell behandling är ett detormaliserat tal, utföras med en hög hastighet för ett normaliserat tal. Alternativt kan steget 103 vara Ändras så att växlingsmängden SH är inställd på E istället för E-1 och en höger 1-bitskiftbehandling kan exekveras ytterligare på mantitsen M endast när E inte är större än LSA före eller efter steget 107 där Vänsterskiftbearbetning exekveras på mantissa M. Den följande beskrivningen kommer successivt att diskutera första till fjärde operativ bearbetningsanordning som skall användas vid utövandet av den operativa bearbetningsmetoden E-nämnd. Den första operativa bearbetningsanordningen som visas i fig 2 innefattar en avtagare 201, en framåtriktad 1 detekteringskrets 202, en minimivärdesvalkrets 203, en vänster växlingsenhet 204, ett mantissa resultatregister 205, en subtraherande krets 206, en Valkretsen 207 och ett exponentresultatregister 208. Decrementorn 201 är anpassad att mata ett värde erhållet genom att subtrahera 1 från en exponent E Den framåtriktade 1 detekteringskretsen 202 är anpassad att söka en mantissa M i riktning från bitens övre med en bit Än radixpunkten till den minst signifikanta bit LSB för att därigenom detektera positionen för den första biten som är lika med 1 och också anpassad att tillförda, som en mängd avbrytande LSA, en skillnad mellan läget för den sålunda detekterade biten och Bitens position övre med en bit än radixpunkten. Minivärdesvalkretsen 203 är anpassad att jämföra i storleksordning med varandra två inmatningsdata, dvs en utmatning E-1 hos avtagaren 201 och en Utmatning LSA hos den framåtriktade 1 detekteringskretsen 202 för att därigenom som en växlingsmängd SH mata inmatningsdata, vilken som helst är den mindre, och för att tillhandahålla en storleksrelaterad bedömningssignal CR som representerar vilken ingångsdata som är mindre ut ur de två ingångsdata När E-1 är mindre än LSA, dvs när E inte är större än LSA är SH lika med E-1 och CR är lika med 1 när E-1 inte är mindre än LSA, det vill säga när E är större än LSA , SH är lika med LSA och CR är lika med 0 Den vänstra shifterenheten 204 är anpassad att, som en mantissa m av resultatet av en operationell bearbetning, tillhandahålla ett värde som erhålles genom att på mantitsen M utföra en vänsterskiftbearbetning som har En skiftmängd specificerad av en utmatning SH av minimivärdesvalkretsen 203 Mantissa-resultatregistret 205 är anpassat att lagra en utgång m från den vänstra shifterenheten 204. Subtraheringskretsen 206 är anpassad att mata ett värde erhållet genom att subtrahera en utgång LSA Av den framåtriktade 1 detekteringskretsen 202 från en exponent ET Han väljer krets 207 är anpassad att, som en exponent e, av resultatet av en operationsbehandling, tillhandahålla 0 när CR är lika med 1 och en utgång E-LSA från subtraherarkretsen 206 när CR är lika med 0 Exponentresultatregistret 208 är anpassad att lagra en utmatning e hos valkretsen 207. Enligt arrangemanget i fig 2 bedömer minimivärjeväljarkretsen 203 huruvida resultatet av en operativ behandling är ett normaliserat tal eller ett detormaliserat tal baserat på faktumet om Eller inte ett värde erhållet genom att subtrahera en utmatnings LSA hos den framåtriktade 1 detekteringskretsen 202 från en utmatning E-1 hos avtagaren 201 är negativ. Skiftmängden SH för mantitsen M och en exponent e av resultatet av en operationsbehandling är Bestämd på så sätt att, baserat på resultatet av den därav gjorda domen, antingen en normaliserande behandling eller en denormaliserande bearbetning skall utföras. Vid denna tidpunkt används den vänstra växlarenheten 204 allmänt för både normalisering av bearbetning och D denormaliseringen av bearbetningen. Minivärdesvalkretsen 203 i fig 2 har funktionen att två 8-bitars inmatningsdata X, Y jämförs i storleksordning med varandra och ingångsdata, beroende på vilket som är mindre, sätts som en utgångsdat Z, Och att det logiska värdet av en utgångs-B-utgång för magnitud-relation därav sätts till 1 när X är mindre än Y Såsom visas i fig 3 har minvärdesvalkretsen 203 en ingångskrets 311, en mellanliggande krets 312 och en utgång Kretsen 313 och är anordnad så att magnitudförhållandet mellan de två ingångsdataen X, Y för var och en av siffrorna förökas från den högsta siffran till den lägsta siffran, och bestämmer sålunda vid hög hastighet en utgångsdat Z som börjar successivt med Den högsta siffran Se den japanska patentanslutna publikationen 3-12735. När de respektive bitarna i ingångs - och utgångsdata X, Y, Z är inställda som Xi, Yi, Zi i O till 7, bestämmer en storleks-bestämningsfunktion g En storleksförhållandehållande funktion pi bildas För varje siffra i ingångskretsen 311 representerar gi 1 att Xi är mindre än Yi och pi 1 representerar att Xi är lika med Yi. Mellanskretsen 312 bildar, baserat på utgångarna gi och pi hos ingångskretsen 311, en magnitud - relationsbestämningsfunktionen gjk och en storleksförhållande innehavsfunktion pjk för siffrorna från jth-siffran till kth-siffran j är mindre än k Exempelvis representerar g67 1 storleksförhållandet för två bitar som X7X6 är mindre än Y7Y6 och p67 1 representerar ekvivalensförhållandet mellan två bitar som X7X6 är lika med Y7Y6 Vidare representerar g47 1 storleksförhållandet för fyra bitar som X7X6X5X4 är mindre än Y7Y6Y5Y4 och p47 1 representerar ekvivalensförhållandet för fyra bitar som X7X6X5X4 är lika med Y7Y6Y5Y4 Dessa magnitud - relationsbestämningsfunktionerna gi, gjk och storleksförhållandehållningsfunktionerna pi, pjk sprids från den högsta siffran till den lägsta siffran. När storleksbestämningsfunktionen gi7 för siffrorna från Varje siffra, den siffran till den högsta siffran, den 7: e siffran erhålls på ovan angivna sätt, Xi väljs i varje siffra när gi7 är lika med 1 och Yi väljs i varje siffra när gi7 är lika med 0 Yi sålunda vald är inställd som Zi Således kan ett minimalt värde på 8 bitars utgångsdata Z erhållas successivt från den högsta biten. I utgångskretsen 313 i fig 3 bestäms emellertid Z7 respektive Z6 enligt g7 och g67, och Z5 och Z4 bestäms enligt g47 och Z3 till Z0 bestäms enligt g07 Storleksförhållande bestämningsfunktionen g07 för siffrorna från den 0: e siffran till den 7: e siffran som är lika med 1 när X är mindre än Y och vilken Är lika med 0 när X inte är mindre än Y, matas från magnitud-relation dömningssignalens utgångsterminal B. Som visas i fig 2 bildas den vänstra shifterenheten 204 genom att ansluta fem 16 bitar, 8-bitar, 4 - bit, 2-bitars och 1-bitars vänster shifters till varandra som ordnade i denna ordning från ingången Sidan av en mantissa M De nedre fem bitarna av en utgång Z7 till Z0 av minimivärjeväljarkretsen 203 tjänar som styrsignaler för respektive fem vänster shifters. Närmare bestämt när en utgång en skiftmängd SH av minivärdesvalkretsen 203 Bestäms successivt från den högsta biten, drivs växlarna i den vänstra shifterenheten 204 successivt med start med 16-bitars shifter i vilken mängden skift är störst följaktligen varje gång var och en av siffrorna i en utgång av minimumet Värdesvalkretsen 203 bestäms successivt från den högsta siffran, på en mantissa M utförs en vänsterskiftbearbetning med en skiftmängd 2 kbit som motsvarar den sålunda bestämda siffran. Såsom diskuteras så är arrangemanget i fig 2 och 3 Har minimivärdesvalkretsen 203 för bestämning av en utgångsdata Z successivt från den högsta siffran, och den flertrinniga vänstra växlarenheten 204 har ett flertal shifters som ska successivel Y som startas med växeln där växlingsmängden är störst. Detta möjliggör att vänsterskiftbearbetningen på en mantissa M exekveras med hög hastighet. Minivärdesvalkretsen 203 är av 8-bitarrangemanget och den vänstra växlarenheten 204 är av 5-stegs arrangemanget av vänster 2 kk O till 4 bitskiftare, varvid antalet bitar av var och en av mantissa M och exponent E för enkel precision beaktas. Sådana arrangemang kan emellertid ändras på lämpligt sätt enligt Antal bitar av var och en av mantissa M och exponent E. I en andra operativ bearbetningsanordning i fig 4 ersätts den minsta värdevalkretsen 203 som visas i fig 2 med en jämförelsekrets och valkrets 401 En valkrets 402 i fig 4 Skiljer sig från väljarkretsen 207 i fig 2 genom att väljarkretsen 402 är anpassad att tillföra en utmatning E-LSA hos subtraherarkretsen 206 när CR är lika med 1 och O när CR är lika med 0. Att jämföra och välja c Kretsen 401 är anpassad att jämföra i storleksordning med varandra två ingångsdata, dvs en utgång LSA hos den framåtriktade 1 detektorkretsen 202 och en exponent E och för att, som en skiftmängd SH, tillföra utgången LSA när LSA är mindre Än exponenten E och en utmatning E-1 hos avtagaren 201 när utmatningen LSA inte är mindre än exponenten E Också är jämförnings - och valkretsen 401 anpassad att mata en storleksrelaterad dödsignal CR som representerar vilken LSA eller E Är mindre När LSA är mindre än E är SH lika med LSA och CR är lika med 1 och när LSA inte är mindre än E är SH lika med E-1 och CR är lika med 0. Enligt arrangemanget i FIG 4 är jämförnings - och väljarkretsen 401 anpassad för att bedöma huruvida resultatet av en operativ behandling är ett normaliserat tal eller ett detormaliserat tal baserat på det faktum huruvida ett värde erhållet genom att subtrahera exponent E från en utgående LSA av framåtgående 1 detekteringskretsen 202 är negativ till skillnad från minimatet Omvärdesvalkretsen 203 i fig 2 kan jämförnings - och utväljarkretsen 401 börja jämföra i storlek två ingångsdata med varandra innan en utmatning av avtagaren 201 bestäms, så att domen kan göras vid en högre hastighet. Skiftmängden SH för mantitsen M och en exponent e av resultatet av en operationell bearbetning kan bestämmas så att, baserat på resultatet av den därav gjorda bedömningen, antingen en normaliserande behandling eller en denormaliserande bearbetning skall utföras. Vid denna tidpunkt, Den vänstra växlarenheten 204 används vanligtvis för normaliseringen av bearbetningen och denormaliseringsbehandlingen. Jämförnings - och valkretsen 401 i fig 4 har den funktion som första och andra 8-bitars inmatningsdata X, Y jämförs i storleksordning med varandra, varigenom Att som utgångsdata Z, X, tillhandahålla X när X är mindre än Y och en tredje 8-bitars inmatningsdata S när X inte är mindre än Y, och att det logiska värdet av magnitud-relation dömningssignalen utgångs terminal B i S inställd till 1 när X är mindre än Y Såsom visas i fig 5 har jämförnings - och väljarkretsen 401 en ingångskrets 411, en mellanliggande krets 412 och en utgångskrets 413 och är likadan anordnad minimivärdesvalkretsen 203, Så att magnitudförhållandet mellan de två ingångsdatana X, Y för var och en av siffrorna förökas från den högsta siffran till den lägsta siffran, vilket således bestämmer vid hög hastighet utgångsdatan Z startande med den högsta siffran. Arrangemanget i FIGS 4 and 5 has the comparing and selecting circuit 401 for determining the output data Z successively from the highest digit, and the multi-stage left shifter unit 204 having a plurality of shifters to be successively operated, starting with the shifter in which the shift amount is the greatest This enables the left shift processing on a mantissa M to be executed at a higher speed The comparing and selecting circuit 401 is of the 8-bit arrangement and the left shifter unit 204 is of the 5-stage arrangemen t of left 2 k k 0 to 4 bit shifters, with the number of bits of each of the mantissa M and the exponent E for single precision taken into consideration However, such arrangements may be suitably changed according to the number of bits of each of the mantissa M and the exponent E. In a third operational processing apparatus shown in FIG 6, a decrementer 201, an advancing 1 detecting circuit 202 and a mantissa result register 205, a first selecting circuit 207 and an exponent result register 208 respectively have the same functions as those of the component elements designated by the same reference numerals in FIG 2 In FIG 6, there are also disposed a subtracting circuit 601, a second selecting circuit 602 and a left shifter 603.The subtracting circuit 601 is adapted to supply, as a result of subtraction, a value obtained by subtracting an output LSA of the advancing 1 detecting circuit 202 from an exponent E, and also to supply a magnitude-relation judging signal Ib representing whether or not E is equal to or smaller than LSA When E is not greater than LSA, Ib is equal to 1, and when E is greater than LSA, Ib is equal to 0 The first selecting circuit 207 is adapted to supply, as an exponent e of the result of an operational processing, 0 when Ib is equal to 1, and an output E-LSA of the subtracting circuit 601 when Ib is equal to 0 The second selecting circuit 602 is adapted to supply, as a shift amount SH, an output E-1 of the decrementer 201 when Ib is equal to 1, and an output LSA of the advancing 1 detecting circuit 202 when Ib is equal to 0 The left shifter 603 is adapted to supply, as a mantissa m of the result of an operational processing, a value obtained by executing, on a mantissa M, a left shift processing having a shift amount specified by an output SH of the second selecting circuit 602 The inside arrangement of the left shifter 603 is not limited to the multi-stage arrangement of the left shifter unit 204 in FIG 2.The subtracting circuit 601 in FIG 6 ha s the both functions of the subtracting circuit 206 and the minimum value selecting circuit 203 shown in FIG 2 More specifically, the subtracting circuit 601 is adapted to supply a subtraction result E-LSA to be subjected to the correction of an exponent E, and to judge whether the result of an operational processing is a normalized number or a denormalized number, based on the fact whether or not a value obtained by subtracting LSA from E is equal to or smaller than 0 Then, the shift amount SH of the mantissa M and an exponent e of the result of an operational processing can be determined such that, based on the judgment thus made, either a normalize processing or a denormalize processing is to be executed At this time, the left shifter 601 is commonly used for the normalize processing and the denormalize processing. The subtracting circuit 601 in FIG 6 has the function that a subtraction result X-Y of two 8-bit input data X, Y is set as an output data Z, and that the logical value of the magnitude-relation judging signal Ib is set to 1 when X is not greater than Y As shown in FIG 7, the subtracting circuit 601 has an input circuit 611, an intermediate circuit 612 and an output circuit 613, and is arranged such that the magnitude relation of the two input data X, Y for each of the digits is propagated from the lowest digit to the highest digit, thus determining the output data Z. When the respective bits of the input and output data X, Y, Z are set as Xi, Yi, Zi i 0 to 7 , the input circuit 611 forms a digit borrow generating signal Igi and a digit borrow propagating signal Ipi for each digit As widely known, the digit borrow generating signal Igi is a signal for executing subtraction, which is formed such that Igi 1 represents that, in an operation of Xi-Yi as to the ith digit, digit borrowing has taken place from the i 1 th digit However, Igi 1 also represents that Xi is not greater than Yi As widely known, the digit borrow propagating signal Ipi is another signal for executing subtraction, which is formed for judging that, in an operation of Xi-Yi, when digit borrowing has taken place from the ith digit to the i-1 th digit and if Ipi is equal to 1, digit borrowing has taken place from the i 1 th digit However, since digit borrowing from the i 1 th digit takes place due to the digit borrowing which has taken place on the i-1 th digit, Ipi 1 also represents that Xi is equal to Yi. Based on the outputs Igi and Ipi of the input circuit 611, the intermediate circuit 611 forms a digit borrow generating signal Igjk and a digit borrow propagating signal Ipjk for the digits from the kth digit to the j th digit k is smaller than j For example, the digit borrow generating signal Ig32 from the second digit to the third digit is a signal for executing subtraction, which is formed such that Ig32 1, represents that, in an operation of two bits of X3X2-Y3Y2, digit borrowing from the fourth digit has taken place However, Ig32 1 also represents the magnitude rela tion of two bits that X3X2 is not greater than Y3Y2 On the other hand, the digit borrow propagating signal Ip32 from the second digit to the third digit is another signal for executing subtraction, which is formed for judging that, in an operation of X3X2-Y3Y2, when digit borrowing has taken place from the second digit to two bits of first and zeroth digits and if Ip32 1 is equal to 1, digit borrowing has taken place from the fourth digit Since digit borrowing from fourth digit takes place due to the digit borrowing which has taken place on the first or zeroth digit, Ip32 1 also represents the equivalence relationship of two bits that X3X2 is equal to Y3Y2 The digit borrow generating signals Igi, Igjk and the digit borrow propagating signals Ipi, Ipjk are propagated from the lowest digit to the highest digit. When the digit borrow generating signal Igi0 for the digits from the lowest digit the zeroth digit to each digit the ith digit is obtained, the output circuit 613 generates Zi, for each digit, based on Ipi and Ig i-1 0 However, Z1 is generated based on Ip1 and Ig0 Since no digit is borrowed from the lowest digit, Z0 is determined based on Ip0 only. When at least one of a digit borrow generating signal Ig70 and a digit borrow propagating signal Ip70 for the digits from the zeroth digit to the 7th digit, is 1, this represents that X is not greater than Y More specifically, the magnitude-relation judging signal Ib can be expressed by the following equation. However, the following equations are established. Accordingly, the following equation is then established EQU1.In the output circuit 613 in FIG 7, the magnitude-relation judging signal Ib is generated with the use of the relation of the equation 6.Generally, it is easy to judge whether or not a subtraction result is negative in a subtracting circuit for executing subtraction of X-Y That is, it is enough to judge whether or not a digit is borrowed from the highest digit However, it is difficult to judge whether or n ot a subtraction result is not greater than 0 That is, it is difficult to judge whether or not a subtraction result is equal to 0 In this connection, it may be considered to add a circuit for making sure that all the bits of a subtraction result are 0 or for making sure that X-Y is not negative and X-Y-1 is negative This may increase the amount of hardware of the subtracting circuit In the subtracting circuit 601 in FIG 7, however, most of the hardware is commonly used for the calculation of the output data Z and the generation of the magnitude-relation judging signal Ib representing that X is not greater than Y X-Y is not greater than 0 It is therefore possible to reduce the amount of the hardware. In a fourth operational processing apparatus in FIG 8, the decrementer 201 in FIG 7 is removed but a right 1-bit shifter 604 is interposed between a left shifter 603 and a mantissa result register 205 The left shifter 603 and the right 1-bit shifter 604 form a bidirectional shifter 605.A sec ond selecting circuit 602 is adapted to supply, as a shift amount SH, an exponent E when Ib is equal to 1, and an output LSA of the advancing 1 detecting circuit 202 when Ib is equal to 0, the exponent E and the output LSA being supplied to the left shifter 603 The right 1-bit shifter 604 is adapted to supply, as a mantissa m of the result of an operational processing, a value obtained by executing a right 1-bit shift processing on an output of the left shifter 603 when Ib is equal to 1, and the output itself of the left shifter 603 when Ib is equal to 0.According to the arrangement in FIG 8, when the subtracting circuit 601 having the inside arrangement shown in FIG 7 makes a judgment that the result of an operational processing is a denormalized number Ib 1 the shift amount SH to be given to the left shifter 603 is set to E and a shift operation of the right 1-bit shifter 604 is started As a result, there is executed, on a mantissa M, a left shift processing having a desired shift am ount E-1 On the other hand, when it is judged that the result of the operational processing is a normalized number Ib 0 , the shift amount SH to be given to the left shifter 603 is set to LSA and a shift operation of the right 1-bit shifter 604 is stopped As a result, there is executed, on a mantissa M, a left shift processing having a desired shift amount LSA More specifically, according to the arrangement in FIG 8, the provision of the right 1-bit shifter 604 eliminates the decrementer 201 in FIG 6, thus simplifying the arrangement of the operational processing apparatus The method of determining an exponent e of the result of an operational processing is similar to that shown in FIG 6.In the embodiment in FIG 8, the right 1-bit shifter 604 is disposed at the output side of the left shifter 603, but the right 1-bit shifter 604 may be disposed at the input side of the left shifter 603.This page is translated from the original by using the Google translator. IEEE 754 - Standard binary a rithmetic float. Author Yashkardin Vladimir 10 2 1,55625 exp 10 2 Number 1,55625 exp 10 2 consists of two parts a mantissa M 1 55625 and the exponent exp 10 2 If the mantissa is in the range 1 -2.3 2 Submission of a denormalized exponential form. Take, for example, the decimal number 155,625 Imagine the number of denormalized exponential way 0,155625 10 3 0,155625 exp 10 3 Number 0,155625 exp 10 3 consists of two parts a mantissa M 0,155625 and exponent exp 10 3 If the mantissa is in the range 0,1 -3.3 3 Converting decimal to binary floating-point number. Our problem is reduced to a decimal floating point numbers in binary floating-point number in exponential normalized form To do this we expand the given number of binary digits.155,625 1 2 7 0 2 6 0 2 5 1 2 4 1 2 3 0 2 2 1 2 1 1 2 0 1 2 -1 0 2 -2 1 2 -3 155,625 128 0 0 16 8 0 2 1 0 5 0 0 125 155,625 10 10011011,101 2 - the number of decimal and binary floating-point. Let the resulting number to the normalized form in decimal and binary sy stem 1,55625 exp 10 2 1,0011011101 exp 2 111.As a result, we have the main components of the normalized exponential of binary numbers Mantissa M 1 0011011101 Exponent exp 2 111. 4 Description converting numbers of IEEE 754.4 1 The transformation of a normalized binary numbers in 32 bit format IEEE 754.The main application in technology and programming formats were 32 and 64 bits For example, in VB using the data types single 32 bit and double 64 bits Consider the transformation of the binary number 10011011 101 format single-precision 32 bit IEEE Standard 754 Other formats of the numbers in IEEE 754 is an enlarged copy of the single-precision. To provide the number in the format single-precision IEEE 754 should bring it to the binary normalized form In 3, we have done this conversion on the number 155 625 Now consider, as a normalized binary number is converted to a 32-bit format IEEE 754.Description of the transformation in 32-bit format IEEE 754.Number can be or - Therefore play a bit to designate the sign of 0-positive 1-negative This most significant bit to 32 bit sequence. Then go exponent bits, this allocates 1 byte 8 bits Exhibitor may be, as the number, with the sign or - To determine the sign of the exponent, not to introduce yet another sign bit, add the offset to the exponent in half byte 127 0111 1111 That is, if our exhibit 7 111 in binary , then shifted exponent 7 127 134 And if our exhibitors was -7, then offset Booths 127-7 120 Biased exponent is written in the allotted 8 bits However, when we will need to obtain an exponential binary numbers, we simply subtract 127 from this byte. The remaining 23 bits set aside for the mantissa However, the normalized binary mantissa first bit is always 1, since the number is in the range 1 The table shows the decimal number 155 625 in the 32-bit format IEEE754.001 1011 1010 0000 0000 0000.2 971 1,99584e 292.From the above, given that the bulk of the numbers in IEEE754 format has a stable small relative error The maxi mum possible relative error for the number is Single 2 -23 100 11,920928955078125e-6 The maximum possible relative error for the number of Double 2 -52 100 2,2204460492503130808472633361816e-14.7 5 General information for the number of single and double precision IEEE standard 754.Table 3 Information about the format 32 64 bit in the standard ANSI IEEE Std 754-1985.length number, bit. offset the exponential E , bits. the remainder of the mantissa M , bits. denormalized binary number. normalized binary number. denormalized number of decimal. F -1 S 2 E -126 M 2 23.F -1 S 2 E -1022 M 2 52.normalized number of decimal. F -1 S 2 E-127 1 M 2 23.F -1 S 2 E-1023 1 M 2 52.Abs max error number. Rel max error denorms number. Rel max error norms number. 2 -149 1,40129846 e -45. 2 -1074 4,94065646 e -324. 2 127 2-2 -23 3,40282347 e 38. 2 1023 2-2 -52 1,79769313 e 308. 8 Rounding numbers in standard IEEE 754.In presenting the floating-point numbers in IEEE Standard 754 have often rounded numbers The standard provides four ways to rounding of numbers. Ways to rounding of numbers of IEEE 754.Rounding tending to the nearest integer. Rounding tends to zero. Rounding tends to. Rounding tends to. Table 3 Examples of rounding to one decimal. to the nearest integer. How is rounding shown in the examples in Table 3 When you convert a number to choose one of the ways of rounding By default, this is the first way, rounding to the nearest integer Often in different devices using the second method - rounded to zero When rounding to zero, simply discard meaningless level numbers, so this is the easiest one in the hardware implementation. 9 Computing problems caused by using the standard IEEE754.IEEE 754 standard is widely used in engineering and programming Most modern microprocessors are manufactured with hardware realization of representations of real variables in the format of IEEE754 Programming language and the programmer can not change this situation, a repose of a real number in the microprocessor does not exist When creating the standard IEEE754-1985 representation of a real variable in the form of 4 or 8 bytes seem very large value, since the amount of RAM MS-DOS was equal to 1 MB A program in this system could be used only 0 64 MB For modern operating systems the size of 8 bytes is null and void, nevertheless the variables in most microprocessors continue to be in the format IEEE754-1985.Consider the error computing, caused by the use of numbers in the format of IEEE754.9 1 Errors associated with accuracy of representation of real numbers in the format of IEEE754 A dangerous reduction. This error is always pre sent in computer calculations The reason for its occurrence is described in paragraph 7 4 -6 for double 10 -14 The absolute errors can be significant, as for single 10 31 and for double 10 292,that may cause problems with calculations. If the sample count on the paper, the answer is 1 Absolute error is 7 Why get the wrong answer Number 123456789 in the single 4CEB79A3hex ieee 123456792 dec absolute error reporting is 3 Number 123456788 in the single 4CEB79A2hex ieee 123456784 dec absolute error reporting is -4 Relative error in the initial numbers of approximately 3,24 e-6 As a result, one operation relative error of the result was 800 , ie increased by 2,5 e 8 times This is what I call A dangerous reduction ie catastrophic decrease of accuracy in the operation where the absolute value of the result is much smaller than any of the input variables. In fact, the error precision of the representation of the most innocuous in computer calculations, and usually many programmers are not payin g any attention Nevertheless, they you can be very frustrating.9 2 Errors associated with improper coercion of types of data Wild error. These errors are caused by the fact that the original number submitted in the format of single and double in a format not usually equal to each other For example the original number 123456789,123456789 Single 4CEB79A3 123456792,0 dec Double 419D6F34547E6B75 123456789,12345679104328155517578125 The difference between Single and Double amount 2,87654320895671844482421875.Here is an example for VB Relative error of the result is gt end lt boby gt lt html gt Enter a number 2 2250738585072011e-308 caused a hang of the process with nearly 100 load CPU Other numbers from this range of problems not caused 2 2250738585072009e-308, 2 2250738585072010e-308, 2 2250738585072012e-308 Report a bug received 30 12 2010, 10 01 2011 fixed by the developer Since PHP is a preprocessor is used by most servers, then any user network within 10 days, was able to close any host How to write the developers that the bug only works in 32-bit systems, but if you increase the accuracy of the boundary, then I think that the 64-bit systems, too, hang not verified The reason for the panic is clear any user, at a certain level of diligence and knowledge, had the opportunity to cut down most of the information resources of the planet within ten days I would not like - would result in more examples of such numbers and such errors. 10 The final part. From the above it is clear that the view that the floating-point result is not beyond the relative error in reporting the greatest number is false Errors listed in Item 9 are added together Such errors as dirty and dangerous zero reduction can make calculation errors unacceptable Particular attention in the programming of computer calculations the programmer should be paid to the results close to zero. Some experts believe that the format of numbers represents a threat to humanity You can read about it in the article IEEE754-tick threatens mankind Although many of the facts in this article over-dramatized, and possibly misinterpreted, but the problem is computing correctly reflected philosophically. I m not a dramatization of the calculations on the standard IEEE754 Standard operating since 1985 and fully entered into the standard IEEE754-2008, which broadened the accuracy of calculations However, the problem of reliability computing today is very urgent, and the standa rd IEEE754-2008 and ISO recommendations have not solved this problem I think in this area needed an innovative idea that developers Standard IEEE754-2008 unfortunately do not possess. Innovative ideas usually come from The main innovative ideas in our world were made by amateurs like-minded people not for money A striking example of this situation was the invention of the phone When a school teacher Alexander Graham Bell Alexander Graham Bell came up with a patent for an invention of the telephone to the president of telecommunications company Western Union Company, which is owned by the transatlantic cable connection with an offer to buy his patent for the invention of the telephone, he was not expelled - no The president of that company offered to consider this question the advice of experts in the field of telegraphy, consisting of specialists and scholars in the field of telecommunications Experts gave their opinion that this invention is useless in the field of telecommunications a nd it is futile Some experts have even written a report that it tsirkachestvo and charlatanism nbsp nbsp Alexander Graham Bell, along with his father in law, decided independently to promote his invention After about 10 years, the telecommunications giant Western Union Co was virtually eliminated phone business from the sphere of telecommunication technologies Today you can see in many Russian cities windows that says Western Union, this company which is engaged in transferring money around the world, and once she was the international telecommunications giant We can conclude opinions of experts in innovative technologies are useless If you think that since the invention of the telephone 1877 in people s minds that something has changed, you re wrong. If scientists who are inventing new and professionals who know how to use the well-known can not solve the problem, you need innovation. Links to new ideas in the field of representation of real numbers in hardware 1 Approksimetika 2 If you know of other innovative ideas in the field of representations of real numbers, then we will be happy to get links to these sources. I would suggest to represent real numbers as fixed-point To view the full range of numbers Double enough to have a variable consisting of 1075 bits integer part and 1075 bits of fractional part, ie about 270 bytes per variable In this case, all numbers will be presented with the same absolute accuracy You can work with numbers in the entire range the real axis, that is, it becomes possible to summarize large numbers of small numbers Step numbers on the real axis is uniform, that is the real axis is linear The data type will be only one, ie do not need the whole, real and other types Here the problem is the realization of registers of microprocessors dimension of 270 bytes, but it s not a problem for modern technology. To write p 9 I had to create a program that represents a number as a variable to a fixed point, long 1075 1075 bytes Where the number can be represented as a string of characters ASCII, ie one symbol equals one digits Just had to write all the arithmetic operations with strings ASCII This program is similar to a paper calculation Since mathematical ability microprocessor in it are not used, she said slowly Why I did it I could not find a program that could accurately represent the number of IEEE754 format, in decimal form I also did not find the program although they certainly have what no doubt where you can enter in box 1075 of significant decimal digits. Here for example just the decimal value of the number of double 7FEFFFFFFFFFFFFF 17976931348623157081452742373170435679807056752584499659891747680315726078002853876058955 863276687817154045895351438246423432132688946418276846754670353751698604991057655128207624 549009038932894407586850845513394230458323690322294816580855933212334827479782620414472316 8738177180919299881250404026184124858368,0.You can use the IEEE754 v 1 0 nbsp to study and evaluate the errors when workin g with real numbers given in the format of IEEE754.References 1 IEEE Standard for Binary Floating-Point Arithmetic Copyright 1985 by The Institute of Electrical and Electronics Engineers, Inc 345 East 47th Street, New York, NY 10017, USA. Acknowledgments Sitkarevu For assistance in creating an article. Archive of reviews with comments nbsp View nbsp nbsp Send us feedback on the e-mail. Floating Point Representation Basics. There are posts on representation of floating point format The objective of this article is to provide a brief introduction to floating point format. The following description explains terminology and primary details of IEEE 754 binary floating point representation The discussion confines to single and double precision formats. Usually, a real number in binary will be represented in the following format. Where I m and F n will be either 0 or 1 of integer and fraction parts respectively. A finite number can also represented by four integers components, a sign s , a base b , a significand m , and an exponent e Then the numerical value of the number is evaluated as. -1 s x m x b e Where m b. Depending on base and the number of bits used to encode various components, the IEEE 754 standard defines five basic formats Among the five formats, the binary32 and the binary64 formats are single precision and double precision formats respectively in which the base is 2.Table 1 Precision Representation. Single Precision Format. As mentioned in Table 1 the single precision format has 23 bits for significand 1 represents implied bit, details below , 8 bits for exponent and 1 bit for sign. For example, the rational number 9 2 can be converted to single precision float format as following. The result said to be normalized if it is represented with leading 1 bit, i e 1 001 2 x 2 2 Similarly when the number 0 000000001101 2 x 2 3 is normalized, it appears as 1 101 2 x 2 -6 Omitting this implied 1 on left extreme gives us the mantissa of float number A normalized number provides more accuracy than corresponding de-normalized number The implied most significant bit can be used to represent even more accurate significand 23 1 24 bits which is called subnormal representation The floating point numbers are to be represented in normalized form. The subnormal numbers fall into the category of de-normalized numbers The subnormal representation slightly reduces the exponent range and can t be normalized since that would result in an exponent which doesn t fit in the field Subnormal numbers are less accurate, i e they have less room for nonzero bits in the fraction field, than normalized numbers Indeed, the accuracy drops as the size of the subnormal number decreases However, the subnormal representation is useful in filing gaps of floating point scale near zero. In other words, the above result can be written as -1 0 x 1 001 2 x 2 2 which yields the integer components as s 0, b 2, significand m 1 001, mantissa 001 and e 2 The corresponding single precision floating number can be represented in binary as shown below. Where the exponent field is supposed to be 2 , yet encoded as 129 127 2 called biased exponent The exponent field is in plain binary format which also represents negative exponents with an encoding like sign magnitude, 1 s compliment, 2 s complement, etc The biased exponent is used for representation of negative exponents The biased exponent has advantages over other negative representations in performing bitwise comparing of two floating point numbers for equality. A bias of 2 n-1 1 , where n is of bits used in exponent, is added to the exponent e to get biased exponent E So, the biased exponent E of single precision number can be obtained as. The range of exponent in single precision format is -126 to 127 Other values are used for special symbols. Note When we unpack a floating point number the exponent obtained is biased exponent Subtracting 127 from the biased exponent we can extract unbiased exponent. The following figure represents floating point scale. Double Precision Format. As mentioned in Table 1 the double precision format has 52 bits for significand 1 represents implied bit , 10 bits for exponent and 1 bit for sign All other definitions are same for double precision format, except for the size of various components. The smallest change that can be represented in floating point representation is called as precision The fractional part of a single precision normalized number has exactly 23 bits of resolution, 24 bits with the implied bit This corresponds to log 10 2 23 6 924 7 the characteristic of logarithm decimal digits of accuracy Similarly, in case of double precision numbers the precision is log 10 2 52 15 654 16 decimal digits. Accuracy in floating point representation is governed by number of significand bits, whereas range is limited by exponent Not all real numbers can exactly be represented in floating point format For any numberwhich is not floating point number, there are two options for floating point approximation, say, the closest floating point number less than x as x and the closest floati ng point number greater than x as x A rounding operation is performed on number of significant bits in the mantissa field based on the selected mode The round down mode causes x set to x, the round up mode causes x set to x , the round towards zero mode causes x is either x or x whichever is between zero and The round to nearest mode sets x to x or x whichever is nearest to x Usually round to nearest is most used mode The closeness of floating point representation to the actual value is called as accuracy. Special Bit Patterns. The standard defines few special floating point bit patterns Zero can t have most significant 1 bit, hence can t be normalized The hidden bit representation requires a special technique for storing zero We will have two different bit patterns 0 and -0 for the same numerical value zero For single precision floating point representation, these patterns are given below.0 00000000 00000000000000000000000 0.1 00000000 00000000000000000000000 -0.Similarly, the standard represents two different bit patters for INF and - INF The same are given below.0 11111111 00000000000000000000000 INF.1 11111111 00000000000000000000000 - INF. All of these special numbers, as well as other special numbers below are subnormal numbers, represented through the use of a special bit pattern in the exponent field This slightly reduces the exponent range, but this is quite acceptable since the range is so large. An attempt to compute expressions like 0 x INF, 0 INF, etc make no mathematical sense The standard calls the result of such expressions as Not a Number NaN Any subsequent expression with NaN yields NaN The representation of NaN has non-zero significand and all 1s in the exponent field These are shown below for single precision format x is don t care bits. x 11111111 1 m 0000000000000000000000.Where m can be 0 or 1 This gives us two different representations of NaN.0 11111111 110000000000000000000000 Signaling NaN SNaN.0 11111111 100000000000000000000000 Quiet NaN QNaN. U sually QNaN and SNaN are used for error handling QNaN do not raise any exceptions as they propagate through most operations Whereas SNaN are which when consumed by most operations will raise an invalid exception. Overflow and Underflow. Overflow is said to occur when the true result of an arithmetic operation is finite but larger in magnitude than the largest floating point number which can be stored using the given precision Underflow is said to occur when the true result of an arithmetic operation is smaller in magnitude infinitesimal than the smallest normalized floating point number which can be stored Overflow can t be ignored in calculations whereas underflow can effectively be replaced by zero. The IEEE 754 standard defines a binary floating point format The architecture details are left to the hardware manufacturers The storage order of individual bytes in binary floating point number varies from architecture to architecture. Thanks to Venki for writing the above article Please wri te comments if you find anything incorrect, or you want to share more information about the topic discussed above. Method for eletronically representing a number, adder circuit and computer system US 5923575 A. The invention relates to a method for electronically representing a number V in a binary data word Both the exponent and the mantissa are represented as 2 complement The mantissa is normalized to 0 1 F if the number V is positive where F is the fraction of the mantissa In case that the number V is negative the fraction F is normalized to 10 F Usage of this format allows to design an improved adder which requires less hardware. 11.1 A method for electronically representing a number V in a binary data word, the data word having a set of exponent bits E and having a set of mantissa bits M, the method comprising the steps of. representing the exponent bits E in 2 complement form and. representing the mantissa bits M in 2 complement form whereby. in case that the number V is positive, a fraction F of the mantissa bits M of the number V is normalized to a 01 F form and the exponent bits E are adapted by shifting the number V a number of times and adding the number shifts to the exponent bits E of the number V and. in case that the number V is negative, the fraction F of the mantissa bits M is normalized to a 10 F form and the exponent bits E are adapted by converting the number V into a 2 complement form, shifting the number V a number of times, and adding the number of shifts to the exponent bits E of the number V and. dropping the leading mantissa bit to form a binary word including the resulting exponent bits E and mantissa bits.2 The method according to claim 1.whereby one of the mantissa bits M is a sign bit and the remaining sub-set of bits is the fraction F so that the number V equals. in case that the sign bit indicates that the number V is positive. in case that the sign bit indicates that the number V is negative. a number of computing units and. an inverse log converter. wherein the input log converter is adapted to convert input data words into a log domain and to shift log converted input data words into the data pipeline. wherein the data pipeline is coupled to the computing units, so that when a data word is shifted through the data pipelines consecutive computing units receive the data word as an input. wherein each computing unit has an output coupled to the inverse log converter to perform a conversion back from the log domain to obtain a result and. wherein an input data word V is electronically represented in the log domain in a binary data word, the data word having a set of exponent bi ts E and having a set of mantissa bits M, the exponent bits E being represented in 2 complement form and the mantissa bits M being represented in 2 complement form whereby. in case that the number V is positive, a fraction F of the mantissa bits M Of the number V is normalized to 01 F form and the exponent bits E are adapted by shifting the number V a number of times and adding the number shifts to the exponent bits E of the number V and. in case that the number V is negative, the fraction F of the mantissa bits M is normalized to a 10 F form and the exponent bits E are adapted by converting the number V into a 2 complement form, shifting the number V a number of times, and adding the number of shifts to the exponent bits E of the number V and. dropping the leading mantissa bit to form a binary word including the resulting exponent bits E and mantissa bits.11 A computer system comprising. an input log converter. a data pipeline. a number of computing units, each computing unit having an adde r for adding a first number M A and a second number M B , the first and second numbers being normalized to have either a leading 01 or a leading 10 in a binary representation, wherein the adder circuit comprises. a an adder block for adding the first number M A and the second number M B to obtain a result. b a leading msb detector coupled to an output of the adder block to detect a sequence of leading 0 or 1 bits in the result, the sequence having a length L and. c a barrel shifter to shift the result for a number of L-1 shifts to the left in order to normalize the result and. an inverse log converter. wherein the input log converter is adapted to convert input data words into a log domain and to shift log converted input data words into the data pipeline. wherein the data pipeline is coupled to the computing units, so that when a data word is shifted through the data pipelines consecutive computing units receive the data word as an input. wherein each computing unit has an output coupled to the inverse log converter to perform a conversion back from the log domain to obtain a result. The present invention is related to the following inventions which are assigned to the same assignee as the present invention. 1 Computer Processor Utilizing Logarithmic Conversion and Method of Use thereof, having Ser No 08 430,158, filed on Mar 13, 1995, now U S Pat No 3,597,670. 2 Exponentiator Circuit Utilizing Shift Register and Method of Using Same , having Ser No 08 401,515, filed on Mar 10, 1995, now U S Pat No 5,553,012. 3 Accumulator Circuit and Method of Use Thereof , having Ser No 08 455,927, filed on May 31, 1995, now U S Pat No 5,644,520. 4 Logarithm Inverse-Logarithm Converter and Method of Using Same , having Ser No 08 381,368, filed on Jan 31, 1995, now U S Pat No 5,642,305. 5 Logarithm Inverse-Logarithm Converter Utilizing Second Order Term and Method of Using Same , having Ser No 08 382,467, filed on Jan 31, 1995, now U S Pat No 5,703,801. 6 Logarithm Inverse-Logarithm Converter Utilizing Linear Interpolation and Method of Using Same , having Ser No 08 391,880, filed on Feb 22, 1995, now U S Pat No 5,600,581. 7 Logarithm Inverse-Logarithm Converter Utilizing a Truncated Taylor Series and Method of Use Thereof , having Ser No 08 381,167, filed on Jan 31, 1995, now U S Pat No 5,604,691. 8 Logarithm Converter Utilizing Offset and Method of Use Thereof , having Ser No 08 508,365, filed on Jul 28, 1995, now U S Pat No 5,629,884. 9 Method and System for performing a convolution operation , having Ser No 08 535,800, filed on Sep 28, 1995.TECHNICAL FIELD OF THE INVENTION. The present invention relates generally to computing and digital signal processing and, in particular, to techniques for electronically representing a number. BACKGROUND OF THE INVENTION. For the purposes of computing and digital signal processing, in particular for telecommunication, it is known in the art to represent numbers as binary data words Such a binary data word typically is representative of some real world value In the case of digital signal processing such a binary data word typically represents a sampled value of some real process like sampled speech or video data. To represent a number in a binary data word for the purposes of computing or digital signal processing a number of approaches are commonly used in the prior art Integer numbers are usually represented in 2 complement In the 2 complement form the most significant bit holds th e sign if the data word is not declared to be an unsigned integer value The 2 complement of a binary number is found by reversing all the digits of the number and then adding one For example, the 2 complement of 0001 is 1110 1 1111 In mathematical terms the 2 complement x of a number x is. Where both x and x are represented as a binary number with k digits. The most popular representation for floating--point numbers is the format according to ANSI IEEE standard 754-1985 which has been implemented by nearly all floating-point chip sets including Intel s 8087 287 387, Motorola s 68881 as well as chip sets from AMD The IEEE standard is therefore universal in microcomputers that accept those chips, including the IBM PC. The way a number is electronically represented for computing purposes is highly influential on the performance of the computing or digital signal processing system which process such a number and therefore on the expense in terms of hardware to obtain a given computing through put. By definition, digital signal processing is connected with the representation of signals by sequences of numbers or symbols and the processing of these signals DSP has a wide variety of applications and its importance is evident in such fields as pattern recognition, radio communications, telecommunications, radar, biomedical engineering, and many others. At the heart of every DSP system is a computer processor that performs mathematical operations on signals Generally, signals received by a DSP system are first converted to a digital format used by the computer processor Then the computer processor executes a series of mathematical operations on the digitized signal The purpose of these operations can be to estimate characteristic parameters of the signal or to transform the signal into a form that is in some sense more desirable Such operations typically implement complicated mathematics and entail intensive numerical processing Examples of mathematical operations that may be perf ormed in DSP systems include matrix multiplication, matrix-inversion, Fast Fourier Transforms FFT , auto and cross correlation, Discrete Cosine Transforms DCT , polynomial equations, and difference equations in general, such as those used to approximate Infinite Impulse Response IIR and Finite Impulse Response FIR filtersputer processors vary considerably in design and function One aspect of a processor design is its architecture Generally, the term computer architecture refers to the instruction set and organization of a processor An instruction set is a group of programmer-visible instructions used to program the processor The organization of a processor, on the other hand, refers to its overall structure and composition of computational resources, for example, the bus structure, memory arrangement, and number of processing elements. In a computer, a number of different organizational techniques can be used for increasing execution speed One technique is execution overlap. Execution ov erlap is based on the notion of operating a computer like an assembly line with an unending series of operations in various stages of completion Execution overlap allows these operations to be overlapped and executed simultaneously. One commonly used form of execution overlap is pipelining In a computer, pipelining is an implementation technique that allows a sequence of the same operations to be performed on different arguments Computation to be done for a specific instruction is broken into smaller pieces, i e operations, each of which takes a fraction of the time needed to complete the entire instruction Each of these pieces is called a pipe stage The stages are connected in a sequence to form a pipeline--arguments of the instruction enter at one end, are processed through the stages, and exit at the other end. These are many different architectures, ranging from complex-instruction-set-computer CISC to reduced-instruction-set-computer RISC based architectures In addition, some archit ectures have only one processing element, while others include two or more processing elements Despite differences in architectures, all computer processors have a common goal, which is to provide the highest performance at the lowest cost However, the performance of a computer processor is highly dependent on the problem to which the processor is applied, and few, if any, low-cost computer processors are capable of performing the mathematical operations listed above at speeds required for some of today s more demanding applications For example, MPEG data compression of an NTSC television signal can only be performed using expensive supercomputers or special purpose hardware. Many other applications, such as matrix transformations in real-time graphics, require data throughput rates that exceed the capabilities of inexpensive, single processors, such as micro processors and commercially available DSP chips Instead, these applications require the use of costly, multiprocessor or multiple - processor computers Although multiprocessor computers typically have higher throughput rates, they also include complex instruction sets and are generally difficult to program. Therefore there is a need to provide for an improved method for electronically representing a number in a binary data word, an improved adder circuit and microprocessor incorporating such an adder circuit and an improved computer system. SUMMARY OF THE INVENTION. The invention is pointed out with particularity in the appended claims Preferred embodiments of the invention are given in the dependent claims. The invention is advantageous in that it allows to represent both the exponent and the mantissa of a number in 2 complement form This is made possible by normalizing the mantissa differently depending on whether the number to be represented is positive or negative Such normalizations can be carried out with minimal hardware expense by performing shift operations. In case that the number to be represented is 0 the i nvention allows to encode the value of 0 in the exponent For this purpose a predefined value of the exponent bits indicates that the number equals 0 This predefined value can be for example a leading 1 with a sequence of zeros If the exponent has a width of 4 bits, the value of zero would be represented by 1000 whereby the mantissa is don t care --in the example considered here. Further the method for electronically representing a number is advantageous in that it allows to add two numbers represented in such a way more efficiently with less hardware expense Due to the representation of the mantissa in 2 complement it is not necessary to compare the mantissas of the two numbers to be added before the calculation is carried in contrast to the above referenced IEEE standard. Moreover the mantissas are always added and not subtracted also if they represent negative numbers This is also due to the 2 complement presentation An additional advantage is that no sign logic is needed As a conseque nce a micro processor which uses the teaching of the invention can more efficiently perform summations and therefore have a higher computing throughput If a computer program is to be carried out by the micro processor this has the effect that it can be carried out at a higher processing speed In the case that the computer program is a digital signal processing application this has the effect that the microprocessor can deal with a higher sampling rate. In digital signal processing like finite or infinite impulse response filtering typically a large number of multiplications has to be carried out If the two operands to be multiplied are converted into the log domain the multiplication becomes a summation The result is obtained by converting the sum back into the normal domain A computer system of such a type is disclosed in above-identified related inventions number 1 Ser No 08 430,158 and number 9 Ser No 08 535,800 Implementation options for such a computer system are also described in various of the copending applications or patents 2 to 8.Such a computer system operating in the log domain consists of a number of computing units which comprise an adder in order to perform the multiplication s in the log domain If a number is represented according to the invention in such a computer system this allows to safe hardware for the adders, improve the operational speed and at the same time save precious silicon floor space Also power can be saved since the design of the adders is more compact. BRIEF DESCRIPTION OF THE DRAWINGS. The invention will become more apparent and will be best understood by referring to the following detailed description of a preferred embodiment in conjunction with the accompanying drawings in which. FIG 1 is a flow chart illustrating a preferred embodiment of the method for electronically representing a number of the present invention. FIG 2 is a flow chart of a preferred embodiment of the method for adding two numbers according to the present inventi on. FIG 3 shows a block diagram of a preferred embodiment of an adder according to the invention. FIG 4 shows a micro processor system which incorporates the principles of the invention. FIG 5 shows an embodiment of a computer system which uses the principles of the invention. DETAILED DESCRIPTION OF THE PREFERRED EMBODIMENTS. Referring to the flow chart of FIG 1 it is explained in more detail how a number V is represented the format of the invention After the number V is inputted in step 100 it is decided in step 102 whether the number V is positive The way this decision is made depends on the way the number V is represented initially If the IEEE representation is used the sign bit can be checked to make the determination. If it is decided in step 102 that the number V is positive the control goes to step 104 in which the number V is put into the form 01 F The exponent of the number V is represented in 2 complement and adapted to the normalization into 01 F correspondingly. First in step 106 the number V is shifted for a number of times so that a leading 01 before the decimal point results This corresponds to the format of 01 F where F stands for the fractional bits behind the decimal point. Second in step 108 the exponent of the number V is adapted according to the number of shifts performed in step 106 If number F is shifted in step 106 to the left in order to obtain the 01 F format this means that the shift has negative value This value is subtracted from the initial exponent of the number V--if any If the number V did not initially have an exponent the number of shifts of step 106 becomes the exponent of the number V The exponent is represented as 2 complement. In step 110 the leading mantissa bit 0 of the mantissa 01 F is dropped The result is outputted in step 112 The result consists of a binary data word 114 which has exponent bits E V 116 and mantissa bits M V 118.The exponent E V is represented without the sign bit in 2 complement form The mantissa M V has a length of N 1 bits M0 V , M1 V , M2 V MN V The leading most significant bit M0 V is set to be equal 1 to indicate that the mantissa is positive The remaining part of the mantissa bits M1 V , M2 V MN V is the fraction F of the format 01 1F to which the number V was shifted in step 106.If it is decided in step 102 that the number V is negative the control goes to step 120 to convert the mantissa into 2 complement representation as well as the exponent, to normalize the mantissa and to adapt the exponent correspondingly. First, in step 122 the number V is converted into a 2 complement representation For the conversion into the 2 complement representation all digits of the number V are inverted and 1 is added to the least significant bit of the inverted number V In step 124 the converted number V is shifted for a number of times so that the format 10 F results similar to the shifting of step 106 Also the exponent of the number F is adapted correspondingly and also represented as a 2 complement. In step 126--similar to step 108--the most significant leading mantissa bit which is 1 is dropped The result is obtained in step 130 which again consists of the exponent bits E V 116 and the mantissa bits M V 118 As opposed to the result obtained in step 112 the mantissa bit M0 V equals 0 to indicate that the value of the number V is negative. In the following examples are given of how a positive number V is represented in the format of the invention. In the first example the number V equals -1 011 and is initially represented in the IEEE format. Since the number V is negative--which is represented by the sign bit in the IEEE format--first the 2 complement has to be determined The sign bit - is represented by 0 so that the initial IEDE representation of V as 01 011 results En 2 complement this is 10 101 after inversion of all bits of 01 011 to 10 100 and adding of 00 001 the original exponent of V--if any--is represented in 2 complement and otherwise remains unchanged In this case no shift ing was necessary to create the format 10 F The resulting mantissa M V is therefore is M0 V 0, M1 V 1, M2 V 0 and M3 V 1 which corresponds to the fraction F 101 of the 10 101 representation of V. In the second example the number V equals 1 010 and is also initially represented in the IEEE format As V is positive it stays 01 010 and the exponent is the same The resulting fraction F is 010.In the next example V equals -1 000 again in IEEE format The 2 complement of 01 000 is 11 000 This does not correspond to the required format 10 F and must therefore be normalized Shifting of 11 000 one shift left results in 10 000 This requires that the original exponent of V is decrement by one. If the actual value of the number V in the format of the invention is to be determined this is done by evaluating. for the case that the sign bit M0 V 1 and thus V positive, or. in case that the sign bit M0 V 0 and thus V negative. Examples are shown in the below table 1.In the example considered in table 1 there are 4 bit positions in the mantissa M V No exponents are shown in table 1--the exponents are assumed to be equal to zero The left most column of table 1 shows the mantissas M V of numbers which are represented according to the invention. Starting from the top of the table the numbers having a leading 0 --in other words M0 V 0--are negative whereas the numbers the lower portion of the table 1 have a most significant bit which is 1 --in other words M0 1--and which are therefore positive The digits after the most significant bit--in this case three bits--are representative of the fraction F of the numbers V. The middle column of the table 1 shows the expanded mantissas of the numbers V of the left most column For the negative numbers this means that 1 is added as the most significant bit This is the inversion of step 128 in which the leading 1 wars dropped In the table the leading 1 appears in brackets Also the decimal point is shown in the middle column of the table 1 corresponding to the normalization performed in the step 124.The same applies analogously to the positive numbers V for which a 0 in brackets is added as an inversion of the step 110 Also the decimal point is shown corresponding to the normalization of the step 106 Using the fraction F as an input to equations 2 and 3, respectively the resulting value is shown in the right most column as a binary value whereby it is assumed that the exponent equals 0 for all the numbers V. If the exponent of a number V is not equal to 0 the real value is obtained by shifting the result shown in the right most column for a number of times corresponding to the exponent. In the following--with reference to FIG 2--it is shown how the unique format of the invention to represent a number V can be advantageously used if two such numbers are to be added In step 200 a number X and a number Y which are to be added are inputted Both X and Y are in the format of the invention. In step 202 the absolute difference D of the exponents E X an d E Y is determined In step 204 it is determined which of the exponents E X and E Y is bigger En step 206 the preliminary assumption is made that the exponent of the result of the summation of X and Y equals the bigger one of the exponents E X and E Y. In step 208 the mantissas M X and M Y are expanded like shown in the middle column of table 1 This means that the leading most significant bit which is 0 for a positive number and 1 for a negative number is reintroduced into the representation of the mantissas to invert steps 110 and 128, respectively. In step 210 the mantissa of the operand X or Y with the smaller exponent is shifted for a number of D shifts to the right The information which of the mantissas has the smaller exponent is obtained from the result of step 204.In step 212 the mantissa which is shifted in step 210 and the other expanded mantissa which was not shifted are added For adding the two mantissas no sign logic is needed since both the shifted and the unshifted mantiss as are represented as 2 complement numbers. In step 214 it is evaluated whether an overflow occurred when the shifted and the unshifted mantissa were added in step 212 Overflow occurred if the shifted and the unshifted mantissas have the same most significant bit and the result of the summation has a different most significant bit If this is the case the control goes to step 216 in which one is added to the preliminary exponent of the result as obtained in step 206 Further in step 216 the result obtained in step 212 of the added mantissas is shifted one position to the right in order to adjust the decimal point The result obtained in step 216 is a final result and is represented in the format of the invention. If it is determined in step 214 that no overflow occurred a sequence of leading 0 or 1 is to be detected in the result obtained by adding the shifted and unshifted mantissas in step 212 The detection of the sequence of leading 0 or 1 is done in step 218.The length of the sequence o f the leading 0 or 1 is denoted L in the following If it is detected in step 220 that the result obtained in step 212 only consists of zeros this indicates that the result of the addition is in fact equal to zero As a value of zero can not be represented in the mantissa when it is in a format according to the invention the value of zero is encoded in the exponent This is done by assigning a predetermined value to the exponent of the result the predetermined value is indicative of the value zero of the result For this purpose any possible exponent value can be selected In the example considered here the exponent is assigned to the value of 10000000 in an 8 bit representation. If it is determined in step 220 that the sequence detected in step 218 does not only consist of zeros, the control goes to step 224 In step 224 the result obtained in step 212 is renormalized to the format of the invention This is done by shifting the result obtained by adding the shifted and unshifted mantissas L-1 times to the left and correspondingly subtracting L-1 from the preliminary exponent of the result obtained in step 206 The resulting number has the form 01 F or 10 F depending on whether the number is positive or negative Since the leading most significant bit in the format 01 F and 10 F is redundant it is thrown in step 226 corresponding to the respective steps 110 and 128 of FIG 1.With reference to FIG 3 now an adder circuit is described which can add the two numbers X and Y In the example considered here the exponents are 8 bit wide and the mantissas are 24 bit wide In the representation of steps 112 and 130 of FIG 1 this means that there are 24 mantissa bits M0-M23 The exponents E X and E Y to be inputted into the adder shown in FIG 3 again are in 2 complement form and the mantissas are normalized in the way as described with reference to FIG 1.The adder shown in FIG 3 has a subtractor 300 which has two inputs to receive the exponents E X and E Y Further the adder of FIG 3 has a z ero detector and multiplexer 302 which also receives the exponents E X and E Y as input values The subtractor 300 has a control output 304 which indicates which one of the exponents E X or E Y is the bigger one of both. The control output 304 is coupled to the zero detector and multiplexer 302 as well as to swap circuit 306 The swap circuit 306 receives the mantissas M X and M Y as 24 bit inputs The swap circuit 306 has a control input 308 which is coupled to the control output 304 further the swap circuit 306 has data outputs 310 and 312.The data outputs 310 and 312 are one bit wider than the inputs of the swap circuit 306--in this case 25 bits instead of 24 bits The data output 310 of the swap circuit 306 is coupled to barrel shifter 314 as a data input The barrel shifter 314 has a control input 316 which is coupled to control output 318 of the subtractor 300.The barrel shifter 314 has a control output 318 which is coupled to data input of adder block 320 The other data input of adder block 320 is coupled to the data output 312 of the swap circuit 306.The zero detector and multiplexer 302 has its output coupled to subtractor adder by 1 block 322 as a data input The other input of the subtractor adder by 1 block 322 is coupled to output 324 of leading most significant bit detector 326.The adder block 320 has an overflow output which is coupled via line 328 to the subtractor adder by 1 block 322 and to barrel shifter 330 The barrel shifter 330 has its data input coupled to data output of the adder block 320 via line 332 The line 332 is 25 bits wide The barrel shifter 330 is also coupled to the output 324 of the leading msb detector 326.The leading msb detector 326 is also coupled via output line 334 to the subtractor adder by 1 block 322 The exponent E Z of the result Z of the summation of X and Y is present at the output 336 of the subtractor adder by 1 block 322 and the normalized mantissa M Z of the result Z is present at the output 338 of the barrel shifter 330.I n operation the exponent bits E X and E Y as well as the mantissa bits M X and M Y of the two numbers X and Y to be added are inputted simultaneously into the adder circuit By means of the subtractor 300 the absolute difference D of the exponents E X and E Y is determined. If the difference D is bigger than the width of the mantissa input into swap circuit 306--in this case 24 bit--the width of the mantissa input is taken as the difference D since this is the maximum number of shifts which can be performed This corresponds to step 202 of FIG 1.The subtractor 300 also determines which one of the exponents E X and E Y is the bigger one This corresponds to step 204 of FIG 2 The information which one of the exponents is bigger is available at the control output 304 According to the logical value of the control output 304 the zero detector and multiplexer 302 is controlled to output the bigger one of the exponents E X and E Y to the subtractor adder by 1 block 322 This corresponds to step 20 6 of FIG 2.The information which one of the exponents E X or E Y is bigger is also inputted into the swap circuit 306 at its control input 308 The swap circuit 306 swaps the inputs M X and M Y so that the mantissa M of one of the numbers X or Y having the smaller exponent is outputted at the data output 310 to the barrel shifter 314.The result of the determination of the difference D is available at the control output 318 of the subtractor 300 and is inputted into the control input 316 of the barrel shifter 314.In the swap circuit 306 the hidden most significant bit is included in the mantissas M X and M Y --corresponding to step 208 of FIG 2 As a consequence the data outputs 310 and 312 of the swap circuit 306 are one bit wider than the mantissa inputs--in this case 25 bits wide The barrel shifter 314 shifts the expanded mantissa of the operand having the smaller exponent for a number of ED shifts to the right--corresponding to step 210 of FIG 2.The result of this shift operation is a vailable at the control output 318 of the barrel shifter 314 and is still 25 bit wide Consecutively both the shifted and the unshifted mantissas are inputted into the adder block 320.If an overflow occurs when the shifted and unshifted mantissas are added in the adder block 320 this is indicated by line 328 both to the subtractor adder by 1 block 322 and the barrel shifter 330 This has the effect that the value of the output line 334 is ignored by the subtractor adder by 1 block 322 and that 1 is added to the exponent inputted by the zero detector and multiplexer 302 into the subtractor adder by 1 block 322 The result of this addition is the final result of the exponent E Z which is outputted at output 336 Correspondingly, the barrel shifter 330 shifts the result outputted by adder block 320 via line 332 one position to the right and drops the leading most significant bit so that the resulting mantissa M Z is obtained at output 338 This corresponds to step 216 of FIG 2.If no overflow o ccurs in the adder block 320 cf step 214 of FIG 2 the leading most significant bit detector 326 which has its data input coupled to the data output of the adder block 320 detects a sequence of leading 0 or 1 to detect the length of the sequence L--like explained with respect to step 218 of FIG 2 The value of L is available at the output 324 of the leading msb detector 326 If the value of L reveals that the result of the summation in adder block 320 is zero this is notified by the leading msb detector 326 to the subtractor adder by 1 block 322 via load output line 334 and a predetermined value which is indicative of the result being zero is loaded into the subtractor adder by 1 block 322 This loaded value is the resulting exponent E Z This corresponds to step 222 of FIG 2.If the result obtained by adder block 320 is not zero, L-1 is subtracted from the exponent inputted by the zero detector and multiplexer 302 into the subtractor adder by 1 block 322 in order to obtain the resulting exp onent E Z Correspondingly the mantissa is normalized by shifting a number of L-1 times to the left in barrel shifter 330 Again the leading most significant bit is dropped in the barrel shifter 330 so that a 24 bit wide resulting mantissa M Z is obtained This corresponds to step 226 of FIG 2.In case that the result obtained at the output of adder block 320 is zero the value of the resulting mantissa M Z is don t care because the value of the exponent indicates that the number Z is in fact zero If however one of the input values X or Y is zero this is detected in the zero detector and multiplexer 302 which compares both exponents E X and E Y with the predefined exponent value which is indicative of zero--in this case 80 h If zero is detected by the zero detector and multiplexer 302 this is notified to the swap circuit 306 via line 340 and the mantissa of the corresponding number X or Y which is 0 is filled with 0 to overwrite any don t care values. With reference to FIG 4 it is explained in greater detail with respect to a preferred embodiment how the invention can be used for computing purposes FIG 4 shows an electronic system 400 which can be any electronic device requiring some kind of computing and or digital signal processing Typical examples are telecommunication devices such as cellular phones. The electronic system 400 has a program storage 402 and memory 404 Computing unit 406 is coupled via a bi-directional bus 408 to the memory 404 A program stored in the program storage 402 can be loaded into the computing unit 406 via line 410.The memory 404 contains a number of data words which are represented in a format according to the invention One of the data words is shown by way of example as data word 412 When the computing unit 406 has to carry out some kind of a digital signal processing calculation it loads the corresponding computer program from the program storage 402 In order to carry out the digital signal processing program data words have to be fetched via the bi-directional bus 408 from the memory 404 The data required for carrying out the computer program is in the unique format according to the invention. This allows to take advantage of the improved adding of numbers which are represented in a format according to the invention in the computing unit 406--for example if the computing unit is a micro processor the micro processor can comprise one or more adders of the type shown in FIG 3 to more economically carry out large numbers of summations. FIG 5 shows a block diagram of a computer system in which the unique representation of a number according to the invention is particularly beneficial The input block converter 500 receives input data words to be inputted into the computer system An input data word is logarithmized by the input log converter 500 and inputted into the first register R0 of data pipeline 502.The data pipeline 502 consists of a number of registers R0 to Rn which are coupled together to form a shift register chain Eac h of the registers Ri is coupled to its corresponding computing unit CUi Each of the computing units CUO-CUn can access its corresponding register R i to access a data word which is stored in the corresponding register. Each of the computing units CUO-CUn has an output which is coupled to reverse log converter 504 The inverse log converter 504 performs an inverse logarithm operation on the output of the computing unit CUi to transform the result of the computation back into the normal domain The results which are obtained by inverting the outputs of the computing units CUi are transferred to an accumulator 506 which adds all the results so that final output results at the output 508 of the accumulator 506.In operation a sequence of data input words are received by the input block converter 500 and a resulting sequence of input data which are in the log domain is shifted into the data pipeline 502 Each computing unit CUi accesses its corresponding register Ri to obtain the corresponding data input value A computation is performed in the computation unit CUi and the result is outputted to the inverse log converter 504 to transform the result of the computation back from the log domain into the normal domain. All the results of the computing units are accumulated in the accumulator 506 after the inverse log operation which is performed by inverse log converter 504 The computation which is carried out in the computation units CUi can be of a finite impulse response filter or infinite impulse response filter type In this case each of the computing units CUi has one coefficient of such a filter operation stored in an internal register which is not shown in the drawing for simplicity To perform such a filter operation in each computing unit the corresponding coefficient has to be multiplied with the input data word stored in the corresponding register Since this multiplication is carried out in the log domain the multiplication becomes a summation In the latter case in fact the computing units CUi are adders which can be implemented by means of an adder of the type as shown in FIG 3 provided that both the input data words in the log domain which are stored in the registers Ri as well as the coefficients of the filter operations which are stored in the computing units are represented in a format according to the principles of the invention. Since in an architecture of the type shown in FIG 5 a large number of computing units exists the use of an adder of the type as shown in FIG 3 has a very substantial positive effect. The same applies analogously to the implementation of the accumulator 506 which can also be realized by adders of the type shown in FIG 3 again provided that the output of the inverse log converter 504 is represented in a format in accordance with the principles of the invention. Normalization of a floating point number. This all depends upon the way floating point numbers are stored Forget binary for now, think in decimal. If I have the value 8 7 6 then I can write it as 87 6 x 10 0 8 76 x 10 1 0 876 x 10 2 0 0876 x 10 3.Normalisation is simply process of choosing which of these is best, according to some rules In decimal, we normally choose 0 876 x 10 2, because it follows these simple rules - The mantissa has no non-zero digits before the decimal point - The mantissa has a non-zero digit immediately after the decimal point Another way of writing this is that the mantissa is in range 0 1 0 99999.Applying this binary floating point numbers When we normalise a binary number we have to apply the same rules to the mantissa It must have no non-zero digits before the decimal I mean, binary point, and a non-zero digit immediately after the binary point Or to put it another way, it must be in the range 0 5 0 999999 in decimal. We do this for several reasons 1 It gets the best use out of our available bits 2 It simplifies the hardware required to do arithmetic. Of course, when we normalise in either decimal or binary, we have to adjust the exponent accordingly to keep the same value. Bob 3 years ago. A number is normalized in order to get the greatest precision This is done by multiplying the number by some power of the number base radix show more A number is normalized in order to get the greatest precision This is done by multiplying the number by some power of the number base radix to get it into a particular range, where it is then truncated or rounded to a fixed number of digits. Since floating point formats have a fixed number of digits, moving the leading digit as far left as possible leaves the most room for low order digits to be retained That s what normalization does, primarily It avoids wasting digit postions by storing leading zeroes. Binary floating point formats can also gain one extra bit of precision by not storing the leading 1 bit The IEEE 488 binary floating point formats do this, for example, and they are used by almost everyone these days Some IBM mainframes still support a base-16 floating point s tandard inherited from the S 360 This is only possible in binary, where the leading digit can only be 1 Zero values indicated by every bit--except perhaps the sign bit--is a 0.If your 8-bit number were to be normalized into an 8-bit field, there s no advantage to normalization However, if you were to normalize the 16-bit value 00101101 01101001 into an 8-bit field, you d get.10110101 1 rounded up to 10110110 if the leading 1 bit is stored, or 1 01101011 0 rounded down to 1 01101011 if the leading 1 is not stored. The bits show leading and trailing bits not stored The bits on the right may be used for rounding, though There are usually different rounding mode options telling how to handle a normalized result that has to lose some bits on the right. Just storing the first 8 bits would get you 00101100, only 5 bits after the leading 1 Normalizing raises that to 7 bits after the leading 1 Normalizing and not storing the leading 1 raises that to 8.husoski 3 years ago. Sign in to add a comment. To expand just a tiny bit on what Bob said, using his example 0 876 x 10 2 is really 876 x 10 2 Because the zero before the decimal while good in show more To expand just a tiny bit on what Bob said, using his example.0 876 x 10 2 is really 876 x 10 2.Because the zero before the decimal while good in print for our eyes is not needed in the computer representation. EddieJ 3 years ago. Sign in to add a comment. Answer this question. Related Questions. Report Abuse. Report Abuse. Sorry, you ve reached your daily asking limit Earn more points or come back tomorrow to ask more. Asking costs 5 points, and then choosing a best answer earns you 3 points Questions must follow the Community Guidelines. Media upload failed You can try to add the media again or go ahead and post the answer. Media upload failed You can try to add the media again or go ahead and post the question. Uploaded image is less than the minimum required 320 x 240 pixel size. Sorry, file format is not supported. You can only upload image s of a size less than 5 MB. You can only upload videos of a size less than 60 MB. Generating preview. Go ahead and post your answer Uploaded video will be live after processing. Go ahead and post your question Uploaded video will be live after processing. Sending request. This may take one or two minutes.

Verklig Binära val Mjölby

Search This Blog

Normalized Mantissa Binära Alternativ

Comments

Post a Comment

Popular posts from this blog

Random Walk Index Forex Handel

No Deposit Binary Alternativ Mäklare(2)

Thundershot69 Binära Alternativ