A3SFTT Instruction Set Development

  • Schmolendevice
    19th Apr 2015 Member 0 Permalink

    Intro: Alright, so this is a thread mainly directed towards advanced TPT users in electronics with knowledge of advanced CRAY, ARAY, DRAY, PSTN and FILT mechanics as well as some experience and understanding of computer science and engineering and computer architecture. This thread is about Advanced Solid-Spark Sub Frame Timing Technology (A3SFTT) and developments towards making TPT's first 60 Hz computer capable of executing instructions every frame.

     

    As per the present I've been working on my designs for a high speed full adder and more compact binary counters (each of which are presently working fine) and have set my eyes towards developing a functonal CPU with my current technology.

     

    Considerations:

     

    As per the present my first experience in making a 60 Hz design that could theoretically execute instructions every frame was from this save at the bottom being capable of doing jumps to different addresses in program memory every three frames although being capable of loading in an instruction every frame (id:1751171). (see https://powdertoy.co.uk/Discussions/Thread/View.html?Thread=19943 for how this solid-spark stuff works)

     

    The general operation of an A3SFTT computing system is to have an instruction cache (FRAM storing the machine program) continuously outputting instructions to a solid-spark electronics based decoder that translates instruction ctypes into signals that trigger certain operations to occur within the computer while receiving new addresses every frame from some form of a program counter or TPT FILT binary counting circuit. 

     

    For my present FRAM devices addresses come in the form of FILT ctypes sent through BRAY to a DTEC attached to a PSTN demultiplexer (DMUX) located to the top-left of the FRAM (but to the right of the 6-bit counter). Within a frame of a new ctype entering the PSTN DMUX (yellow and green line of FILT) a new pattern of sparked PSCN will be sent down to the PSTN read/write arm extending it by a certain length.

     

    Every frame, a DTEC particle on the read/write arm head will be checking for any BRAY ctypes to store at the current FILT location and a solid-spark ARAY attached to the moving PSTN head will always been 'reading' the current location. As long as an address is in the PSTN DMUX the read/write head will always be at a location and the ctype of the current location will be output at a FILT output at the bottom-left of the FILT memory array. From this line of FILT new instructions can be fed to a solid-spark based decoder every frame which will convert these ctypes into signals triggering various operations in the computing system.

     

    Fixed Length Instruction Sets:

     

    As per the present I believe there have only been three working FILT mechanics based computers that have entered TPT, such having had been made by mark2222, LBPHacker and Synergy respectively. mark2222 and Synergy I believe went with an instruction set that supports arithmetic and data operations with the full 29-bit range of FILT ctypes while LBPHacker went with a 28-bit instruction width system for a 16-bit architecture. What I'd believe is that all of these computers' instruction sets mostly made full use of the whole 28 to 29 bit range of FILT and at most could store only up to one instruction per FILT particle being able to process one instruction or FILT particle per CPU clock.

     

    *Also to note, my general belief is that when naming the specifications of a computing system, instruction length/width is 'completely' different from the computer architecture's actual magnitude. As in you can have a 29-bit instruction width whilst internal buses and functional units theoretically the computer can only perform operations on let's say 8 or 16 bit instructions depending on how the opcode system deals with literals. Hence if something were to be called a 32-bit architectue I would expect for the instruction set to fully support by some means definitions for 32-bit literals or at least 16-bit literals to parts of a 32-bit register and operations on 32-bit integers at a time.

     

    What I would call these are fixed length instruction sets which could theoretically ensure execution of single operations per CPU clock. The problem I have with fixed length instructions sets is when a complete instruction can be defined that doesn't have to use the entire instruction width space (e.g. an instruction that requires no argmuments). These instructions when stored in memory might 'waste' some unused bits and hence may not achieve maximum 'code density.'

     

    I guess it's understandable if the general thought was of subverting the complexity of dividing FILT data into physical bytes where instructions will always occupy every byte in program memory and instructions can overlap between FILT particles for a simpler form of RISC or CISC design, but in general the following are my proposals for a type of CISC-like instruction set whose completion is at the base of the final design and production of my first TPT processor.

     

    The Planned SCHM 'BitStormer' Architecture and Instruction Set:

     

    The BitStormer instruction set is an instruction set designed to allow for complete access of the CPU's resources as well as access to external buses, ports, peripherals, devices or memory devices. My plans are to eventually be able to support CPU or device interrupts and stack-based subroutine calls with proper 'push/pop' instructions. 

     

    The present instruction set works upon an idea I call 'High Throughput Execution Architecture' where the CPU's instruction cache and decoder takes advantage of the amount of instruction data the instruction cache can output at a time to process as many instruction as possible within a single CPU clock. The general idea is that in real computing systems there will always be a maximum 'instruction throughput' or number of bits output to the decoder at a time, and as per TPT that 'maximum throughput' given present technology is 29 bits per frame. 

     

    In the SCHM BitStormer architecture, FILT memory is divided into individual FILT particles, each capable of storing 3 bytes of data as well as an addressable 4-bit 'nybble.' Every machine instruction starts with a base byte that begins access to all the instructions divided amongst four base categories defined by the two most significant bits of the base byte: 

     

    ~ **Still yet to find a place to fit my instruction set outline. Just that there was a lot of information I wanted to be able to display, and now double post protection won't allow me to include the info on the actual instruction set until someone copies. It's very hard for me to compress info when the grandeur majority of it is what I wish to communicate.

     

    Conclusion: So these are essentially my present development towards making a powerful instruction set for execution of as many instructions as possible within each 24-bit FILT particle instruction throughput. This isn't complete as I have a Word file growing on a separate file on my computer. Pretty much base bytes giving access to a variety of base categories for literal direct assignment, register access, memory access and access to complex, powerful instruction strings for sending long lines of compact instructions to any functional unit of your choice. Personally completion of this instruction set and obtainment of others opinions on its efficiency as well as other ways I could try to do things to improve its functionality is what blocks me from developing my first processor. Sorry that it was rather long, but I guess I'd hope to get some feedback on the matter. All due to my plans on moving from RISC designs into something much bigger and powerful on TPT.

    Edited once by Schmolendevice. Last: 18th Apr 2015
  • mniip
    19th Apr 2015 Developer 2 Permalink
    Humble opinion: the A3SFTT acronym sucks. SSE is inherently better.
  • jacob1
    19th Apr 2015 Developer 0 Permalink
    I agree with mniip, I was just about to post a link to the other thread. This is a lot to read ... but Solid State Electronics does sound better than A3SFTT, whatever that stands for. It sounds like some kind of computer architecture, also FFT is something else :P.
  • Schmolendevice
    19th Apr 2015 Member 0 Permalink

    Sigh. I just needed a good folk to provide objective proof that A3SFTT/MOSFET is insufficient and aesthetically not as pleasing as I'd think it to be. Hehe, SSE thence. 

     

    @jacob1 Well "Advanced Solid Spark Sub-Frame Timing Technology, orA3SFTT/'aes-fet' which sounds like MOSFET (metal-oxide semiconductor field effect transistor). Just as MOSFET and CMOS logic families were ground breaking in speed and size in the real world electronics industry, so can A3SFTT to the Powder Toy electronics industry." was my argument, but if my thesis is insufficient so be it.

     

    Anyways...

     

    ~

     

    Instruction Set Outline:

     

    INST = (instruction definition equals) [AA][XXXXXX]

    A: 2-bit base addressing mode/category select.

    X: Further instruction specification whose 'meaning' is dependent upon 'argument A.'

    Brackets mark opcode/argument bounds and spaces separate bytes or 4-bit nybbles.

    • 00: Direct Assignment Operations: Involves assignment of literals to any out of 16 different registers accessed by a 4-bit 'ID.' My instruction set will support assignment of either one 16-bit integer to bytes 0 and 1 of the destination register or a single 8-bit literal to any of bytes 0 to 2 of the destination register.
    • INST = [00][MM][RRRR] []
      • M: 2-bit assignment mode.
      • R: 4-bit destination register select from 0000 to 1111: 
      • 0000-0111: 
        • D, AD, CD main port registers, Data, Address Data and Control Data.
        • ACC, A, B, C for generic calculations, operand and result storage.
        • CTR or counter register, a 16-bit binary counter for generic incrementation/decrementation.
      • 1000-1011:
        • GP0-GP3: Support for up to 4 general purpose registers.
      • 1100-1111:
        • STP: Stack pointer/register. Should be an address pointing to the current address in the stack pushed to or that can be popped from.
        • STL: Stack location (thanks to newly obtained knowledge from grade 12 engineering, this register is probably useless)
        • RTN: Return address (probably to be replaced with actual data stored in the stack)
        • FLG: General flag/Status register.
    • 01: Register Operations: Allows transfer of data between any two out of 16 registers. Supports 8 different assignment types I'll eventually reach. Setting the 4-bit origin and destination registers selects to be the same allows you to transfer bytes of data to different locations within the same register.
      • INST = [01][F][MMM][DD] [AAAA][BBBB]
      • F: A bit further specifying whether register transfer operations are to be conducted or rather some other category of operations (free bit that can have any use).
      • M: Assignment mode.
      • D: Destination byte select. Some assignment modes allow you to send one byte from the current register to another. This is where you select the destination byte. (May be given a different purpose for all values of M that don't require a destination byte. For example I still need a way to directly assign the 4-bit nybble to either a literal and if possible one of 6 other nybbles stored in the register.)
      • A and B: 4-bit register IDs for the origin and destination registers of this data transfer.
    • 10: Data Manipulation and Branching: Provides access to all of 8 different categories of functional units each of which there can be four different units within possessing different computational resources and being accessed or controlled with different 'sub instruction sets.'
      • INST = [10][A][XXXXX]
      • A: If 0, following bits and bytes define functional unit access and control operations. If 1, following bits and bytes define conditional and unconditional goto operations.
      • X: Dependent on A.
      • INST = [10][0][TTT][UU]
        • T: Selects the type or category of functional unit you wish to access with this base byte.
          • 000-111: CTRX, ALUX, MLTX, SRX, BLU, DMU, FPUX and QAU for counters, arithmetic logic units, multiplication hardware, shift registers, binary logic units, data manipulation units (setting and obtaining individual bits of data as well as moving bits or bytes within a particular register), floating point units (for IEEE-754 32-bit floating point operation support, takes two registers or memory locations to store a result) and quick access units (initially for quick switching between functional units or functional unit categories without 'exiting' 'functional unit instruction string space' and having to declare another base byte) respectively.
        • U: Selects one out of up to four different functional units within that category provided by the CPU. Amidst my present computer architectural developments is a technology I call 'NBCPISA,' or Non-Byte-Confined Parallel Instruction Stream Architecture which posits the notion of having an instruction set that can have some forms of 'instruction strings' spanning arbitrarily through memory to save memory at the expense of decoder complexity.
        • When accessing a functional unit via a base byte you essentially start an instruction string for decoding byte that specific selected functional unit. Each instruction string is made from instruction characters of varying length all based upon base character definitions or sub instruction bases which trigger access to a variety of varying length operand sets and decoding paths.
        • An instruction character base (yeah, even came up with another name) can specify if the following bits or bytes specify instructions or data for assigning the current functional unit's operands and destination register, operations to be conducted on that functional unit or values to set that functional unit's flags to. Once a single instruction character during decoding has finished having its operands/opcodes defined and decoded, that character ends and definition plus decoding for a new instruction character base begins. The new instruction character base may define a new operation to be executed within the current 24-bit throughput, define a quick switch in access to a different functional unit or functional unit category or invoke an exit of the instruction string leading to decoding of a new base byte.
        • SCHM BitStormer systems decode instructions one FILT pixel per frame or one 24-bit throughput per frame (TPF). Base bytes may only start at the beginning of one of the three byte positions in a FILT particle. A BitStormer control unit/decoder should be designed to decode as many base bytes as possible within the current throughput (as per 24-bit throughputs, I would keep that to two). Following that, if an instruction string has been started, the decoder must also be able to process as many instruction characters as possible within that 24-bit throughput including exits and switches between functional units or categories. (If an exit instruction ends a few bits before the following byte, decoding of a new base byte begins one that following byte, disregarding the unused bits)
        • Finally, if operand definition for a base byte or instruction character cannot be completed within a single 24-bit throughput, decoding for that operand definition or character will be resumed in the following frame when that data becomes available. The CPU will be designed to support instructions strings of instructon characters spanning accross arbitrary amounts of FILT particles and 24-bit throughputs.
      • INST [10][1][OOOOO] [opcode and argument data still to be resolved]
      • O: Opcode data still to be resolved and finalized. Overall goto operations will support jumps based on any flag set of any functional unit to any literal or indirect memory location specified 8-bit goto address (6 bits to select one out of 64 FILT locations in instruction cache and 2 bits to select which byte to start decoding a base byte from). 
      • One of the bases of the SCHM BitStormer architecture is the ability to start and instruction definition on any byte of any FILT particle in memory and the ability to jump to and begin the decoding of a new base byte from any bytes of any FILT particle in 64-FILT instruction cache
    • 11: Memory, Device and System Settings Access: Includes all instructions for communication between the CPU, memory and external devices as well as access to system settings and queing of new processes or threads to specific cores.
      • INST = [11][TT][XXXX]
      • T: Access Type: Four different types of data access, CPU and memory/device, memory to memory, memory and device, device interrupt access.
    Edited 3 times by Schmolendevice. Last: 18th Apr 2015