<?xml version='1.0' encoding='UTF-8'?><?xml-stylesheet href="http://www.blogger.com/styles/atom.css" type="text/css"?><feed xmlns='http://www.w3.org/2005/Atom' xmlns:openSearch='http://a9.com/-/spec/opensearchrss/1.0/' xmlns:georss='http://www.georss.org/georss' xmlns:gd='http://schemas.google.com/g/2005' xmlns:thr='http://purl.org/syndication/thread/1.0'><id>tag:blogger.com,1999:blog-281866467424266708</id><updated>2012-02-16T09:56:04.063-08:00</updated><category term='parallel processing'/><category term='dataflow'/><category term='8 core'/><category term='multicore'/><category term='symmetric multicore'/><category term='pipeline'/><category term='kilocore'/><category term='data parallel'/><category term='SIMD'/><category term='propeller'/><category term='MIMD'/><title type='text'>KiloCore</title><subtitle type='html'>About Kilo core class custom processors or machines.</subtitle><link rel='http://schemas.google.com/g/2005#feed' type='application/atom+xml' href='http://kilocore.blogspot.com/feeds/posts/default'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/281866467424266708/posts/default?max-results=100'/><link rel='alternate' type='text/html' href='http://kilocore.blogspot.com/'/><link rel='hub' href='http://pubsubhubbub.appspot.com/'/><author><name>Michael O'Brien</name><uri>http://www.blogger.com/profile/14907623981077693781</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='32' height='24' src='http://2.bp.blogspot.com/_yohDTnOfkgY/TMowTX9EOvI/AAAAAAAAAAc/_zY4yYWMmCk/S220/_michael_oracle_20060908_desk_IMG_0184b.JPG'/></author><generator version='7.00' uri='http://www.blogger.com'>Blogger</generator><openSearch:totalResults>2</openSearch:totalResults><openSearch:startIndex>1</openSearch:startIndex><openSearch:itemsPerPage>100</openSearch:itemsPerPage><entry><id>tag:blogger.com,1999:blog-281866467424266708.post-2359893983651455591</id><published>2011-01-07T20:52:00.000-08:00</published><updated>2011-01-07T20:52:08.999-08:00</updated><category scheme='http://www.blogger.com/atom/ns#' term='dataflow'/><category scheme='http://www.blogger.com/atom/ns#' term='multicore'/><category scheme='http://www.blogger.com/atom/ns#' term='MIMD'/><category scheme='http://www.blogger.com/atom/ns#' term='symmetric multicore'/><category scheme='http://www.blogger.com/atom/ns#' term='pipeline'/><category scheme='http://www.blogger.com/atom/ns#' term='data parallel'/><category scheme='http://www.blogger.com/atom/ns#' term='SIMD'/><title type='text'>Alternate Multicore Architectures</title><content type='html'>Alternate Multicore Architectures&lt;br /&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp; The following designs are possible alternatives or successors to my existing hypercube or array processor designs.&lt;br /&gt;&lt;br /&gt;D1: DataFlow Processor&lt;br /&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp;The data flows in a single direction in a pipeline or tree&amp;nbsp;fashion where each core performs a single&amp;nbsp;process on the dataset&amp;nbsp;which is passed along to the next node(s).&lt;br /&gt;D2: Data Parallel Processor &lt;br /&gt;&amp;nbsp;&amp;nbsp; This architecture is fine grained and usually assigns a single core to each data point for SIMD oriented user software.&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/281866467424266708-2359893983651455591?l=kilocore.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://kilocore.blogspot.com/feeds/2359893983651455591/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://kilocore.blogspot.com/2011/01/alternate-multicore-architectures.html#comment-form' title='0 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/281866467424266708/posts/default/2359893983651455591'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/281866467424266708/posts/default/2359893983651455591'/><link rel='alternate' type='text/html' href='http://kilocore.blogspot.com/2011/01/alternate-multicore-architectures.html' title='Alternate Multicore Architectures'/><author><name>Michael O'Brien</name><uri>http://www.blogger.com/profile/14907623981077693781</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='32' height='24' src='http://2.bp.blogspot.com/_yohDTnOfkgY/TMowTX9EOvI/AAAAAAAAAAc/_zY4yYWMmCk/S220/_michael_oracle_20060908_desk_IMG_0184b.JPG'/></author><thr:total>0</thr:total></entry><entry><id>tag:blogger.com,1999:blog-281866467424266708.post-2140094897342590899</id><published>2010-11-04T09:35:00.000-07:00</published><updated>2010-11-16T13:33:33.510-08:00</updated><category scheme='http://www.blogger.com/atom/ns#' term='propeller'/><category scheme='http://www.blogger.com/atom/ns#' term='parallel processing'/><category scheme='http://www.blogger.com/atom/ns#' term='kilocore'/><category scheme='http://www.blogger.com/atom/ns#' term='8 core'/><category scheme='http://www.blogger.com/atom/ns#' term='SIMD'/><title type='text'>Kilocore SIMD Multiprocessor Array based on the Parallax Propeller 8-core microcontroller</title><content type='html'>&lt;strong&gt;Purpose:&lt;/strong&gt;&lt;br /&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp; This blog will follow the construction of a prototype &lt;a href="http://en.wikipedia.org/wiki/SIMD"&gt;SIMD&lt;/a&gt; multiprocessor&amp;nbsp;array of&amp;nbsp;simple 32-bit processors arranged in a mesh architecture.&lt;br /&gt;Details about experiments leading up to this prototype were &lt;a href="http://www.objectivej.com/hardware/propcluster/index.html"&gt;detailed here&lt;/a&gt;.&lt;br /&gt;The initial hardware configuration will be on breadboards as I get the connection topology, modularization and grid/host software worked out.&lt;br /&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp; I will be using the 160 MIPS &lt;a href="http://www.parallax.com/propeller/"&gt;Parallax propeller P8X32&lt;/a&gt; DIP&amp;nbsp;(8-core/8-thread) microcontroller as the mesh PU.&lt;br /&gt;&amp;nbsp;&amp;nbsp; &lt;br /&gt;&lt;br /&gt;I may use the superior 1600 MIPS &lt;a href="http://www.xmos.com/products/development-kits/xc-1a-development-kit"&gt;XMOS XS1-G4&lt;/a&gt; (4-core / 32-thread), but XMOS does not currently ship a DIP version of their surface mount chip like Parallax Inc. - making prototypes difficult to implementdoes. I could use the G4 as the host bridge between the processor array and the PC however, but not until i get a replacement for .binary loading of the mesh via the host - which the propeller excels at.&lt;br /&gt;&lt;br /&gt;&lt;strong&gt;Requirements:&lt;/strong&gt;&lt;br /&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp; Our goal is to design a software/hardware combination that results in a grid/mesh of equal processing units (PU) controlled by a single host controller that is accessible from a host PC.&lt;br /&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp; Hardware modules are defined as follows...&lt;br /&gt;&lt;ul&gt;&lt;li&gt;M1: Host PC&lt;/li&gt;&lt;li&gt;M2: Host Controller&lt;/li&gt;&lt;li&gt;M3: PU Grid&lt;/li&gt;&lt;li&gt;M4: Grid Monitor Display (optional)&lt;/li&gt;&lt;/ul&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp; Software Modules are defined as follows...&lt;br /&gt;&lt;ul&gt;&lt;li&gt;S1: Host PC Interface (&lt;strong&gt;Java&lt;/strong&gt;)&lt;br /&gt;&lt;u&gt;S1.1:&lt;/u&gt; Serial&amp;nbsp;bidirectional connector (&lt;strong&gt;javax.comm&lt;/strong&gt;)&lt;br /&gt;&lt;u&gt;S1.2:&lt;/u&gt; HTTP unidirectional connector (&lt;strong&gt;java.net&lt;/strong&gt;)&lt;br /&gt;&lt;u&gt;S1.3:&lt;/u&gt; Persistence connector (&lt;strong&gt;org.eclipse.persistence.jpa&lt;/strong&gt;)&lt;/li&gt;&lt;li&gt;S2: Host Controller (&lt;strong&gt;SPIN/Assembly&lt;/strong&gt;)&lt;br /&gt;S2.1: Serial bidirectional connector&lt;br /&gt;S2.2: LED display driver&lt;br /&gt;S2.3: Grid Clock Generator (Assembly)&lt;br /&gt;S2.4: Grid Parallel Loader&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; &lt;br /&gt;S2.5: SIPO Grid Input Register (74hc595 out)&lt;br /&gt;S2.6: PISO Grid Output Register (74hc165/597 in)&lt;/li&gt;&lt;li&gt;S3: Grid PU (&lt;strong&gt;SPIN/Assembly&lt;/strong&gt;)&lt;/li&gt;&lt;/ul&gt;&lt;strong&gt;Constraints:&lt;/strong&gt;&lt;br /&gt;&lt;br /&gt;&lt;ul&gt;&lt;li&gt;&lt;u&gt;C1: Power:&lt;/u&gt; Power consumption under 5A - I am currently using a 15W bench supply.&amp;nbsp; (In the production prototype I may use a 500W supply that has a 25W 3.3v rail but i will need to load the 12 and 5v rails)&lt;/li&gt;&lt;li&gt;&lt;u&gt;C2: Grid Boostrap:&lt;/u&gt; SIMD bootstrap model for the PU (processing unit) grid, or 0..1 EEPROM in total.&lt;/li&gt;&lt;/ul&gt;&lt;div&gt;&lt;br /&gt;&lt;/div&gt;&lt;u&gt;Analysis:&lt;/u&gt;&lt;br /&gt;&lt;u&gt;&amp;nbsp;&amp;nbsp; &amp;nbsp;S1.1:&lt;/u&gt; Serial&amp;nbsp;bidirectional connector (&lt;strong&gt;javax.comm&lt;/strong&gt;)&lt;br /&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp; In the past I developed direct port drivers for the PPT and SER port using VisualStudio 6, I could not get the SUN Comm API to work outside of Linux. However, I came across a page by Rick Proctor for the Lego RCX Brick at &lt;a href="http://dn.codegear.com/article/31915"&gt;http://dn.codegear.com/article/31915&lt;/a&gt; and at &lt;a href="http://llk.media.mit.edu/projects/cricket/doc/serial.shtml"&gt;http://llk.media.mit.edu/projects/cricket/doc/serial.shtml&lt;/a&gt; which explains how to setup and implement the SerialPortEventListener interface.&lt;br /&gt;&lt;br /&gt;&lt;u&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp; S2.4: Grid Parallel Loader:&amp;nbsp; &lt;/u&gt;&lt;br /&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp; In the past I used the technique originally posted on the Parallax Propeller Forum by users [&lt;a href="http://forums.parallaxinc.com/forums/default.aspx?f=25&amp;amp;m=301878&amp;amp;p=1"&gt;godzich/Christian, pems&lt;/a&gt;] in 2008.&amp;nbsp; This involved connecting up to 12 propellers to a single EEPROM and taking advantage of the I2C bus mastering by resetting each propeller in serial sequnce by the previously loaded propeller.&amp;nbsp; Each chip requires 1.3 seconds to boot and we are limited by parasitic capacitance to around 12 chips off a single EEPROM.&amp;nbsp; Therefore I started running into trouble with an 80 chip SIMD grid - where I required 10 EEPROMS for the entire grid - a programming headache.&lt;br /&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp; Use of the &lt;a href="http://obex.parallax.com/objects/61/"&gt;PropellerLoader&lt;/a&gt; by Chip Gracey was not really feasible without some elaborate 3-state bus mastering logic or use of 160 pins to load all the chips in parallel.&amp;nbsp; However, there was a recent post by &lt;a href="http://forums.parallax.com/showthread.php?t=124343&amp;amp;page=2"&gt;[clock loop&lt;/a&gt;] that expanded on Chip's loader by setting up the PU grid to listen on the RX port but only reply to the TX port with one of the grid chips.&amp;nbsp; Essentially the host programs one of the grid chips with the others acting as listeners and getting programmed in parallel as long as we account for worst case timing.&lt;br /&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp; This latest approch to bootstrapping by using the Grid controller to load all the Grid PU chips requires that the SIMD grid SPIN/Assembly code be written to a bytcode &lt;strong&gt;.binary&lt;/strong&gt; file by using the PT IDE command [Run | Compile Current | View Info (F8) | Save Binary File]&lt;br /&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp; The issue of determining whether the entire grid was loaded successfully is still solved by having the chips respond to the host using the PISO output grid register - which is read by the host after grid programming.&lt;br /&gt;&lt;br /&gt;&lt;u&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp; Topology:&lt;/u&gt;&lt;br /&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp; We will initially be implementing a 2-dimensional mesh network that may be toroidal.&amp;nbsp; Although a hypercube architecture would be more computationally efficient with an &lt;strong&gt;O(log(n))&lt;/strong&gt; depth and ability to simulate tree and mesh architectures itself - the initial program space is local so we do not need arbitrary communication between distant nodes.&amp;nbsp; One of the main reasons we are not implementing a hypercube routing network at this time is that it would require 1-3 of the processors on the 8-core chip for external and internal routing.&amp;nbsp; We would also only be implementing a hypercube of clusters of 4 cores - because a router node for each core would not be efficient.&amp;nbsp; The use of a router-less design allows us fine 1:1 granular control over the network.&lt;br /&gt;&lt;br /&gt;&lt;div class="separator" style="clear: both; text-align: center;"&gt;&lt;a href="http://3.bp.blogspot.com/_yohDTnOfkgY/TNglOgGZgII/AAAAAAAAADg/LgH2HUs_zdU/s1600/propCAS_16core_module_ext_connect_block_v20100907.jpg" imageanchor="1" style="margin-left: 1em; margin-right: 1em;"&gt;&lt;img border="0" height="400" px="true" src="http://3.bp.blogspot.com/_yohDTnOfkgY/TNglOgGZgII/AAAAAAAAADg/LgH2HUs_zdU/s400/propCAS_16core_module_ext_connect_block_v20100907.jpg" width="352" /&gt;&lt;/a&gt;&lt;/div&gt;&lt;br /&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp; The cost of a hypercube network is also &lt;strong&gt;O(n log(n))&lt;/strong&gt; where a mesh network is &lt;strong&gt;O(n)&lt;/strong&gt;, as well the processor count must be on power of 2 boundaries (I therefore need to implement 64 or 128 chips for example - not 80).&amp;nbsp; &lt;br /&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp; Some statistics...&lt;br /&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp; The number of wires for a 6 dimensional hypercube would be &lt;em&gt;n-cube = 2(2^((n-1)-cube) + (2^d)) = 384 &lt;/em&gt;lines, however the number of lines for mesh would be 64 x 24 =&amp;nbsp;1536&amp;nbsp;lines -&amp;nbsp;(&lt;strong&gt;&lt;em&gt;these are minus the on-chip internal software connections)&lt;/em&gt;&lt;/strong&gt;. - we actually would have&amp;nbsp;64&amp;nbsp;x&amp;nbsp;12&amp;nbsp; Therefore for small quantities of processing units - the cost is actually cheaper for hypercubes.&amp;nbsp; But if we could somehow power up 1024 chips - which I think unlikely due to my inexact implementation of power, capacitance, induction and resistance factors - we would be requiring a 10 dimensional hypercube of 4-cog clusters.&amp;nbsp; The number of wires would be&amp;nbsp;&lt;em&gt;49152&lt;/em&gt; for 4096 cogs in a hypercube vs.&amp;nbsp;1024 x 24 =&amp;nbsp;24576 for an 8-core 8192 cog mesh - which is around 1/4 the lines per/processing unit.&lt;br /&gt;&lt;br /&gt;&lt;strong&gt;Design:&lt;/strong&gt;&lt;br /&gt;&lt;br /&gt;&lt;u&gt;Software Modules (UML static diagram):&lt;/u&gt;&lt;br /&gt;&lt;br /&gt;&lt;div class="separator" style="clear: both; text-align: center;"&gt;&lt;a href="http://4.bp.blogspot.com/_yohDTnOfkgY/TNxkXaiHN8I/AAAAAAAAAEI/VpVqDiG2KHM/s1600/pac_uml_v20101111.jpg" imageanchor="1" style="margin-left: 1em; margin-right: 1em;"&gt;&lt;img border="0" height="640" px="true" src="http://4.bp.blogspot.com/_yohDTnOfkgY/TNxkXaiHN8I/AAAAAAAAAEI/VpVqDiG2KHM/s640/pac_uml_v20101111.jpg" width="459" /&gt;&lt;/a&gt;&lt;/div&gt;&lt;div class="separator" style="clear: both; text-align: center;"&gt;&lt;/div&gt;&lt;div class="separator" style="clear: both; text-align: center;"&gt;&lt;/div&gt;&lt;br /&gt;&lt;u&gt;Software Simulation&lt;/u&gt;&lt;br /&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp; In this section we detail a software abstraction of our actual hardware implementation so we can verify the logic and design of the entire system while it is in use.&lt;br /&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp; The following UML class diagram shows a view of the simulation model implemented as a standard JEE6/JPA2 persistence unit.&lt;br /&gt;&lt;div class="separator" style="clear: both; text-align: center;"&gt;&lt;a href="http://4.bp.blogspot.com/_yohDTnOfkgY/TOL4PplwP9I/AAAAAAAAAEg/wVD67fB56Yo/s1600/dataparallel_uml_model_v20101116.jpg" imageanchor="1" style="margin-left: 1em; margin-right: 1em;"&gt;&lt;img border="0" height="640" px="true" src="http://4.bp.blogspot.com/_yohDTnOfkgY/TOL4PplwP9I/AAAAAAAAAEg/wVD67fB56Yo/s640/dataparallel_uml_model_v20101116.jpg" width="480" /&gt;&lt;/a&gt;&lt;/div&gt;&lt;br /&gt;&lt;br /&gt;&lt;div class="separator" style="clear: both; text-align: center;"&gt;&lt;a href="http://4.bp.blogspot.com/_yohDTnOfkgY/TNgldhx9VNI/AAAAAAAAADs/-pDTo9_kaVU/s1600/pcas_24chip_192core_prototype_20100720.JPG" imageanchor="1" style="margin-left: 1em; margin-right: 1em;"&gt;&lt;img border="0" height="400" px="true" src="http://4.bp.blogspot.com/_yohDTnOfkgY/TNgldhx9VNI/AAAAAAAAADs/-pDTo9_kaVU/s400/pcas_24chip_192core_prototype_20100720.JPG" width="353" /&gt;&lt;/a&gt;&lt;/div&gt;&lt;div class="separator" style="clear: both; text-align: center;"&gt;&lt;a href="http://1.bp.blogspot.com/_yohDTnOfkgY/TOH96Pw0F5I/AAAAAAAAAEQ/0DjrHPshqNM/s1600/prop8_mesh_v01_bb.jpg" imageanchor="1" style="margin-left: 1em; margin-right: 1em;"&gt;&lt;img border="0" height="412" px="true" src="http://1.bp.blogspot.com/_yohDTnOfkgY/TOH96Pw0F5I/AAAAAAAAAEQ/0DjrHPshqNM/s640/prop8_mesh_v01_bb.jpg" width="640" /&gt;&lt;/a&gt;&lt;/div&gt;2 chip - 16 core breadboard initial wiring prototype (fritzing.org)&lt;br /&gt;&lt;br /&gt;&lt;strong&gt;Implementation:&lt;/strong&gt;&lt;br /&gt;In-use Rectilinear Mesh (no routing)&lt;br /&gt;&lt;br /&gt;&lt;br /&gt;Pinout Mesh Array chip:&lt;br /&gt;&lt;pre&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; +-----+--+-----+&lt;br /&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp; in0&amp;nbsp; N6 --&amp;gt; p0&amp;nbsp; |1&amp;nbsp;&amp;nbsp;&amp;nbsp; +--+&amp;nbsp;&amp;nbsp; 40| p31 &amp;lt;-- Host RX&lt;br /&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp; in1&amp;nbsp; N7 --&amp;gt; p1&amp;nbsp; |2&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; 39| p30 N/C --&amp;gt; (1 chip TX)&lt;br /&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp; in2&amp;nbsp; NE --&amp;gt; p2&amp;nbsp; |3&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; 38| p29 SDA --&amp;gt; N/C&lt;br /&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp; in3&amp;nbsp; E0 --&amp;gt; p3&amp;nbsp; |4&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; 37| p28 SCL &amp;lt;-- N/C&lt;br /&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp; in4&amp;nbsp; E2 --&amp;gt; p4&amp;nbsp; |5&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; 36| p27 --&amp;gt; DONE/LED&lt;br /&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp; in5&amp;nbsp; E4 --&amp;gt; p5&amp;nbsp; |6&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; 35| p26 &amp;lt;-- C2 (DATA)&lt;br /&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp; in6&amp;nbsp; E6 --&amp;gt; p6&amp;nbsp; |7&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; 34| p25 &amp;lt;-- C1&lt;br /&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp; in7&amp;nbsp; SE --&amp;gt; p7&amp;nbsp; |8&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; 33| p24 &amp;lt;-- C0&lt;br /&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; vss |9&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; BB&amp;nbsp;&amp;nbsp;&amp;nbsp; 32| VDD&lt;br /&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; boe |10&amp;nbsp; n-grid&amp;nbsp; 31|&amp;nbsp; XO &amp;lt;-- N/C&lt;br /&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp; RES_bkst--&amp;gt; res |11&amp;nbsp;&amp;nbsp; mesh&amp;nbsp;&amp;nbsp; 30|&amp;nbsp; XI &amp;lt;-- Host clk&lt;br /&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; VDD |12&amp;nbsp;&amp;nbsp;&amp;nbsp; (8)&amp;nbsp;&amp;nbsp; 29| vss&lt;br /&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp; in8&amp;nbsp; S1 --&amp;gt;&amp;nbsp; p8 |13&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; 28| p23 --&amp;gt; 165 Y7&lt;br /&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp; in9&amp;nbsp; S0 --&amp;gt;&amp;nbsp; p9 |14&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; 27| p22 --&amp;gt; 165 Y6&lt;br /&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp; in10 SW --&amp;gt; p10 |15&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; 26| p21 --&amp;gt; 165 Y5&lt;br /&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp; in11 W7 --&amp;gt; p11 |16&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; 25| p20 --&amp;gt; 165 Y4&lt;br /&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp; in12 W5 --&amp;gt; p12 |17&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; 24| p19 --&amp;gt; 165 Y3&lt;br /&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp; in13 W3 --&amp;gt; p13 |18&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; 23| p18 --&amp;gt; 165 Y2&lt;br /&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp; in14 W1 --&amp;gt; p14 |19&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; 22| p17 --&amp;gt; 165 Y1&lt;br /&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp; in15 NW --&amp;gt; p15 |20&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; 21| p16 --&amp;gt; 165 Y0&lt;br /&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; +--------------+&lt;br /&gt;&lt;/pre&gt;Pinout Host chip:&lt;br /&gt;&lt;pre&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; +-----+--+-----+&lt;br /&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; RES0 &amp;lt;-- p0&amp;nbsp; |1&amp;nbsp;&amp;nbsp;&amp;nbsp; +--+&amp;nbsp;&amp;nbsp; 40| p31 N/C --&amp;gt; (r)cog_in_all&lt;br /&gt;&amp;nbsp; RDY_STATE &amp;lt;-- p1&amp;nbsp; |2&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; 39| p30 N/C --&amp;gt; (r)cog_out_all&lt;br /&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; 165S &amp;lt;-- p2&amp;nbsp; |3&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; 38| p29 SDA --&amp;gt; EEPROM&lt;br /&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; 165C &amp;lt;-- p3&amp;nbsp; |4&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; 37| p28 SCL &amp;lt;-- EEPROM&lt;br /&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; 165D --&amp;gt; p4&amp;nbsp; |5&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; 36| p27 --&amp;gt; c2&lt;br /&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; 595D_S &amp;lt;-- p5&amp;nbsp; |6&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; 35| p26 --&amp;gt; c1&lt;br /&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; 595D_R &amp;lt;-- p6&amp;nbsp; |7&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; 34| p25 --&amp;gt; c0&lt;br /&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; 595D_A &amp;lt;-- p7&amp;nbsp; |8&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; 33| p24 &lt;br /&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; vss |9&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; BB&amp;nbsp;&amp;nbsp;&amp;nbsp; 32| VDD&lt;br /&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; boe |10&amp;nbsp; n-grid&amp;nbsp; 31|&amp;nbsp; XO&lt;br /&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; --&amp;gt; res |11&amp;nbsp;&amp;nbsp; host&amp;nbsp;&amp;nbsp; 30|&amp;nbsp; XI&lt;br /&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; VDD |12&amp;nbsp;&amp;nbsp;&amp;nbsp; (8)&amp;nbsp;&amp;nbsp; 29| vss&lt;br /&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; &amp;lt;-- p8&amp;nbsp; |13&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; 28| p23 &lt;br /&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; &amp;lt;-- p9&amp;nbsp; |14&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; 27| p22 &lt;br /&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; &amp;lt;-- p10 |15&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; 26| p21 &lt;br /&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; &amp;lt;-- p11 |16&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; 25| p20 &lt;br /&gt;&amp;nbsp;MESH_CLOCK &amp;lt;-- p12 |17&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; 24| p19 &lt;br /&gt;&amp;nbsp;MESH_RESET &amp;lt;-- p13 |18&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; 23| p18 --&amp;gt;&lt;br /&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp; MESH_RX &amp;lt;-- p14 |19&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; 22| p17 --&amp;gt;&lt;br /&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp; MESH_TX &amp;lt;-- p15 |20&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; 21| p16 --&amp;gt; LED0&lt;br /&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; +--------------+&lt;br /&gt;&lt;/pre&gt;Deprecated 3-Hypercube (with routers)&lt;br /&gt;&lt;div class="separator" style="clear: both; text-align: center;"&gt;&lt;a href="http://4.bp.blogspot.com/_yohDTnOfkgY/TNglSL_QLMI/AAAAAAAAADk/9HGWiyFGJvQ/s1600/prop_3hypercube.jpg" imageanchor="1" style="margin-left: 1em; margin-right: 1em;"&gt;&lt;img border="0" height="563" px="true" src="http://4.bp.blogspot.com/_yohDTnOfkgY/TNglSL_QLMI/AAAAAAAAADk/9HGWiyFGJvQ/s640/prop_3hypercube.jpg" width="640" /&gt;&lt;/a&gt;&lt;/div&gt;&lt;br /&gt;&lt;div class="separator" style="clear: both; text-align: center;"&gt;&lt;a href="http://1.bp.blogspot.com/_yohDTnOfkgY/TNglW_67osI/AAAAAAAAADo/0mfF6-86vFs/s1600/IMG_7978_propCAS_64cog_proto_20100907c.JPG" imageanchor="1" style="margin-left: 1em; margin-right: 1em;"&gt;&lt;img border="0" height="400" px="true" src="http://1.bp.blogspot.com/_yohDTnOfkgY/TNglW_67osI/AAAAAAAAADo/0mfF6-86vFs/s400/IMG_7978_propCAS_64cog_proto_20100907c.JPG" width="300" /&gt;&lt;/a&gt;&lt;/div&gt;&lt;br /&gt;&lt;br /&gt;&lt;strong&gt;Testing:&lt;/strong&gt;&lt;br /&gt;&lt;br /&gt;&lt;strong&gt;Simulation in Software using Java JEE:&lt;/strong&gt;&amp;nbsp;&amp;nbsp; Instead of using VHDL/Verilog we will simulate our SIMD devices in software using Java as the computing substrate along with JPA to persist our model and simulation runs.&lt;br /&gt;&lt;br /&gt;&lt;u&gt;Performance Results:&lt;/u&gt;&lt;br /&gt;Without JPA persistence (in memory Entity creation/traversal only)&lt;br /&gt;[&amp;nbsp; 11&amp;nbsp;&amp;nbsp; 111&amp;nbsp; 1&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; 1111&amp;nbsp; 1] iter: 65536 time: 12319 ns&lt;br /&gt;Total time: 2.699895754 sec @ 24273.52978458738 iter/sec&lt;br /&gt;With JPA persistence (Derby 10.5.3.0 on the same server)&lt;br /&gt;[&amp;nbsp; 11&amp;nbsp;&amp;nbsp; 111&amp;nbsp; 1&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; 1111&amp;nbsp; 1] iter: 65536 time: 13232403 ns&lt;br /&gt;Total time: 967.985705124 sec @ 67.703479145495 iter/sec&lt;br /&gt;From these results we are able to remove the object instantation overhead from the test and concentrate on persistence times.&lt;br /&gt;&lt;br /&gt;&lt;br /&gt;&lt;strong&gt;References:&lt;/strong&gt;&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/281866467424266708-2140094897342590899?l=kilocore.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://kilocore.blogspot.com/feeds/2140094897342590899/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://kilocore.blogspot.com/2010/11/kilocore-simd-multiprocessor-array.html#comment-form' title='1 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/281866467424266708/posts/default/2140094897342590899'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/281866467424266708/posts/default/2140094897342590899'/><link rel='alternate' type='text/html' href='http://kilocore.blogspot.com/2010/11/kilocore-simd-multiprocessor-array.html' title='Kilocore SIMD Multiprocessor Array based on the Parallax Propeller 8-core microcontroller'/><author><name>Michael O'Brien</name><uri>http://www.blogger.com/profile/14907623981077693781</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='32' height='24' src='http://2.bp.blogspot.com/_yohDTnOfkgY/TMowTX9EOvI/AAAAAAAAAAc/_zY4yYWMmCk/S220/_michael_oracle_20060908_desk_IMG_0184b.JPG'/></author><media:thumbnail xmlns:media='http://search.yahoo.com/mrss/' url='http://3.bp.blogspot.com/_yohDTnOfkgY/TNglOgGZgII/AAAAAAAAADg/LgH2HUs_zdU/s72-c/propCAS_16core_module_ext_connect_block_v20100907.jpg' height='72' width='72'/><thr:total>1</thr:total></entry></feed>
