an Internet weblog, by Bryan Hinton
Thursday, April 6, 2023
Multidimensional arrays of function pointers in C
Wednesday, January 12, 2022
Concurrency, Parallelism, and Barrier Synchronization - Multiprocess and Multithreaded Programming
When the currently executing process relinquishes the processor, either voluntarily or involuntarily, another process can execute its program code. This event is known as a context switch, which facilitates interleaved execution. Time-sliced, interleaved execution of program code within an address space is known as concurrency.
The Linux kernel is fully preemptive, which means that it can force a context switch for a higher priority process. When a context switch occurs, the state of a process is saved to its process control block, and another process resumes execution on the processor.
A UNIX process is considered heavyweight because it has its own address space, file descriptors, register state, and program counter. In Linux, this information is stored in the task_struct. However, when a process context switch occurs, this information must be saved, which is a computationally expensive operation.
Concurrency applies to both threads and processes. A thread is an independent sequence of execution within a UNIX process, and it is also considered a schedulable entity. Both threads and processes are scheduled for execution on a processor core, but thread context switching is lighter in weight than process context switching.
In UNIX, processes often have multiple threads of execution that share the process's memory space. When multiple threads of execution are running inside a process, they typically perform related tasks. The Linux user-space APIs for process and thread management abstract many details. However, the concurrency level can be adjusted to influence the time quantum so that the system throughput is affected by shorter and longer durations of schedulable entity execution time.
In the 1:1 model, one user-space thread is mapped to one kernel thread. This allows for true parallelism, as each thread can run on a separate processor core. However, creating and managing a large number of kernel threads can be expensive.
In the 1:N model, multiple user-space threads are mapped to a single kernel thread. This is more lightweight, as there are fewer kernel threads to create and manage. However, it does not allow for true parallelism, as only one thread can execute on a processor core at a time.
In the M:N model, N user-space threads are mapped to M kernel threads. This provides a balance between the 1:1 and 1:N models, as it allows for both true parallelism and lightweight thread creation and management. However, it can be complex to implement and can lead to issues with load balancing and resource allocation.
Parallelism on a time-sliced, preemptive operating system means the simultaneous execution of multiple schedulable entities over a time quantum. Both processes and threads can execute in parallel across multiple cores or processors. Concurrency and parallelism are at play on a multi-user system with preemptive time-slicing and multiple processor cores. Affinity scheduling refers to scheduling processes and threads across multiple cores so that their concurrent and parallel execution is close to optimal.
It's worth noting that affinity scheduling refers to the practice of assigning processes or threads to specific processors or cores to optimize their execution and minimize unnecessary context switching. This can improve overall system performance by reducing cache misses and increasing cache hits, among other benefits. In contrast, non-affinity scheduling allows processes and threads to be executed on any available processor or core, which can result in more frequent context switching and lower performance.Wednesday, February 24, 2021
A hardware design for variable output frequency using an n-bit counter
As the switches are moved or the buttons are pressed, the seven-segment display is updated to reflect the numeric output frequency, and the output pin(s) are driven at the desired frequency. The onboard clock runs at 50MHz, and the signal on the output pins is set on the rising edge of the clock input signal (positive edge-triggered). At 50MHz, the output pins can be toggled at a maximum rate of 50 million cycles per second or 25 million rising edges of the clock per second. An LED attached to one of the output pins would blink 25 million times per second, not recognizable to the human eye. The persistence of vision, which is the time the human eye retains an image after it disappears from view, is approximately 1/16th of a second. Therefore, an LED blinking at 25 million times per second would appear as a continuous light to the human eye.
scaler <= compute_prescaler((to_integer(unsigned( SW )))*scaler_mlt);
gpiopulse_process : process(CLOCK_50, KEY(0))
begin
if (KEY(0) = '0') then -- async reset
count <= 0;
elsif rising_edge(CLOCK_50) then
if (count = scaler - 1) then
state <= not state;
count <= 0;
elsif (count = clk50divider) then -- auto reset
count <= 0;
else
count <= count + 1;
end if;
end if;
end process gpiopulse_process;
The scaler signal is calculated using the compute_prescaler function, which takes the value of a switch (SW) as an input, multiplies it with a multiplier (scaler_mlt), and then converts it to an integer using to_integer. This scaler signal is used to control the frequency of the pulse signal generated on the output pin.It is important to note that concurrent statements within an architecture are executed concurrently, meaning that they are evaluated concurrently and in no particular order. However, the sequential statements within a process are executed sequentially, meaning that they are evaluated in order, one at a time. Processes themselves are executed concurrently with other processes, and each process has its own execution context.
Tuesday, August 25, 2020
Creating stronger keys for OpenSSH and GPG
Create Ed25519 SSH keypair (supported in OpenSSH 6.5+). Parameters are as follows:
-o save in new format-a 128 for 128 kdf (key derivation function) rounds
-t ed25519 for type of key
ssh-keygen -o -a 128 -t ed25519 -f .ssh/ed25519-$(date '+%m-%d-%Y') -C ed25519-$(date '+%m-%d-%Y')
Create Ed448-Goldilocks GPG master key and sub keys.
gpg --quick-generate-key ed448-master-key-$(date '+%m-%d-%Y') ed448 sign 0
gpg --list-keys --with-colons "ed448-master-key-08-03-2021" | grep fpr
gpg --quick-add-key "$fpr" cv448 encr 2y
gpg --quick-add-key "$fpr" ed448 auth 2y
gpg --quick-add-key "$fpr" ed448 sign 2y
Sunday, September 2, 2018
96Boards - JTAG and serial UART configuration for ARM powered, single-board computers
The 96boards CE specification calls for an optional JTAG connection. The specification also indicates that the optional JTAG connection shall use a 10 pin through hole, .05" (1.27mm) pitch JTAG connector. The part is readily available on most electronics sites. Breaking out the pins with long wires and shrink wrapping them is ideal for making sure that each connection is labeled and separate when connecting to a JTAG debugger. While a JTAG connection is not required for flashing or loading the bootloaders onto the board, the JTAG connection is useful for advanced chip-level debugging. The serial UART connection is sufficient for loading release or debug versions of bl0, bl1, bl2, bl31, bl32, the kernel, and userspace. Last but not least, ARM-powered boards, with 12V power input, often require external fans to keep the board cool. As seen in the below photos, two 5V fans were powered from an external power supply. Any work on microcontroller boards should be performed on a grounded surface. Proper grounding procedures should always be followed as most microcontroller boards contain ESD sensitive components.
In the below photos, a 96Boards SBC is mounted on an IP65, ABS plastic junction box for durability. The pins are extended and mounted with screws underneath the junction box. The electrical conduit holes on the side of the junction box are ideal for holding small, project fans. The remaining electrical conduit holes provide a clean place to place the remaining wires from the board - micro USB, USB-C, and 12V power.
Thursday, June 7, 2018
HiKey 960 Linux Bridged Firewall
The Kirin 960 SoC and on-board USB 3.0 make the HiKey 960 SBC an ideal platform for running a Linux Bridged firewall. The number of single-board computers with an SoC as powerful as the HiSilicon Kirin 960 are limited.
When compared with the Raspberry Pi series of single board computers (SBC), the HiKey 960 SBC is significantly more powerful. The Kirin 960 also stands above the ARM powered SoCs which reside in most commercial routers.
USB 3.0 makes the HiKey 960 board an attractive option for bridging or routing, filtering network traffic, or connecting to an external gateway via IPSec. Both network traffic filtering and IPSec tunneling can be computationally expensive operations. However; the multicore Kirin 960 is well suited for these types of tasks.
In order to be able to run an IPSec client tunnel and a Linux Bridged firewall connected over 1G ethernet links, certain kernel configuration modifications are needed. Furthermore, the Android Linux kernel for the HiKey 960 board does not boot on a standard Linux root filesystem because it is designed to boot an Android customized rootfs.
The latest googlesource Linux kernel (hikey-linaro-4.9) for Android (designed to boot Android on the HiKey 960 board) has been customized to remove the Android specific components so that the kernel boots on a standard Linux root filesystem, with the proper drivers enabled for network connectivity via attached 1000Mb/s USB 3.0 to ethernet adapters. The standard UART interface on the board should be used for serial connectivity and shell access. WiFi and Bluetooth have been removed from the kernel configuration. The kernel should be booted off of a microSDHC UHS-I card. The 96boards instructions should be followed for configuring the HiKey 960 board, setting the jumpers on the board, building and flashing the l-loader, firmware package, partition tables, UEFI loader, ARM Trusted Firmware, and optional Op-TEE. Links for the normal Linux kernel configuration, multi-interface bridge configuration, and single interface IPSec configuration are below. Additional kernel config modifications may be needed for certain types of applications.
kernel build instructions
mkdir /usr/local/toolchains cd /usr/local/toolchains/ wget https://releases.linaro.org/components/toolchain/binaries/latest/aarch64-linux-gnu/gcc-linaro-7.2.1-2017.11-x86_64_aarch64-linux-gnu.tar.xz tar -xJf gcc-linaro-7.2.1-2017.11-x86_64_aarch64-linux-gnu.tar.xz export ARCH=arm64 export CROSS_COMPILE=/usr/local/toolchains/gcc-linaro-7.2.1-2017.11-x86_64_aarch64-linux-gnu/bin/aarch64-linux-gnu- export PATH=/usr/local/toolchains/gcc-linaro-7.2.1-2017.11-x86_64_aarch64-linux-gnu/gcc-aarch64-linux-gnu/bin:$PATH cd /usr/local/src git clone https://android.googlesource.com/kernel/hikey-linaro
cd hikey-linaro git checkout -b android-hikey-linaro-4.9
make hikey960_defconfig make -j8
multi-interface bridge configuration
Bridged configuration, no ip addresses on dual nic interfaces. (crossover cable is useful for testing). Bridge interface obtains dhcp address(/11) from wlan router. aliased interface added to br0 and assigned private subnet ip on different subnet (/8). Spanning tree set on bridge interface. Basic ebtables and iptables ruleset below.
brctl addbr <br> brctl addif <br> <eth1> <eth2> ifconfig <br> up ifconfig <eth1> up ifconfig <eth2> up brctl stp <br> yes dhclient <br> ifconfig <br>:0 <a.b.c.d/sn> up iptables --table nat --append POSTROUTING --out-interface <br> -j MASQUERADE iptables -P INPUT DROP iptables --append FORWARD --in-interface <br>:0 -j ACCEPT ebtables -P FORWARD DROP ebtables -P INPUT DROP ebtables -P OUTPUT DROP ebtables -t filter -A FORWARD -p IPv4 -j ACCEPT ebtables -t filter -A INPUT -p IPv4 -j ACCEPT ebtables -t filter -A OUTPUT -p IPv4 -j ACCEPT ebtables -t filter -A INPUT -p ARP -j ACCEPT ebtables -t filter -A OUTPUT -p ARP -j ACCEPT ebtables -t filter -A FORWARD -p ARP -j REJECT ebtables -t filter -A FORWARD -p IPv6 -j DROP ebtables -t filter -A FORWARD -d Multicast -j DROP ebtables -t filter -A FORWARD -p X25 -j DROP ebtables -t filter -A FORWARD -p FR_ARP -j DROP ebtables -t filter -A FORWARD -p BPQ -j DROP ebtables -t filter -A FORWARD -p DEC -j DROP ebtables -t filter -A FORWARD -p DNA_DL -j DROP ebtables -t filter -A FORWARD -p DNA_RC -j DROP ebtables -t filter -A FORWARD -p LAT -j DROP ebtables -t filter -A FORWARD -p DIAG -j DROP ebtables -t filter -A FORWARD -p CUST -j DROP ebtables -t filter -A FORWARD -p SCA -j DROP ebtables -t filter -A FORWARD -p TEB -j DROP ebtables -t filter -A FORWARD -p RAW_FR -j DROP ebtables -t filter -A FORWARD -p AARP -j DROP ebtables -t filter -A FORWARD -p ATALK -j DROP ebtables -t filter -A FORWARD -p 802_1Q -j DROP ebtables -t filter -A FORWARD -p IPX -j DROP ebtables -t filter -A FORWARD -p NetBEUI -j DROP ebtables -t filter -A FORWARD -p PPP -j DROP ebtables -t filter -A FORWARD -p ATMMPOA -j DROP ebtables -t filter -A FORWARD -p PPP_DISC -j DROP ebtables -t filter -A FORWARD -p PPP_SES -j DROP ebtables -t filter -A FORWARD -p ATMFATE -j DROP ebtables -t filter -A FORWARD -p LOOP -j DROP ebtables -t filter -A FORWARD --log-level info --log-ip --log-prefix FFWLOG ebtables -t filter -A OUTPUT --log-level info --log-ip --log-arp --log-prefix OFWLOG -j DROP ebtables -t filter -A INPUT --log-level info --log-ip --log-prefix IFWLOG
single-interface ipsec gateway configuration
iptables -t nat -A POSTROUTING -s <clientip>/32 -o <eth> -j SNAT --to-source <virtualip> iptables -t nat -A POSTROUTING -s <clientip>/32 -o <eth> -m policy --dir out --pol ipsec -j ACCEPT
Thursday, February 1, 2018
a Hardware Design for XOR gates using sequential logic in VHDL
ModelSim Full Window view with wave form output of xor simulation. ModelSim-Intel FPGA Starter Edition © Intel
XOR logic gates are a fundamental component in cryptography, and many of the typical stream and block ciphers use XOR gates. A few of these ciphers are ChaCha (stream cipher), AES (block cipher), and RSA (block cipher).
While many compiled and interpreted languages support bitwise operations such as XOR, the software implementation of both block and stream ciphers is computationally inefficient compared to FPGA and ASIC implementations.
Hybrid FPGA boards integrate FPGAs with multicore ARM and Intel application processors over high-speed buses. The ARM and Intel processors are general-purpose processors. On a hybrid board, the ARM or Intel processor is termed the hard processor system or HPS. Writing to the FPGA from the HPS is typically performed via C from an embedded Linux build (yocto or buildroot) running on the ARM or Intel core. A simple bitstream can also be loaded into the FPGA fabric without using any ARM design blocks or functionality in the ARM core for a hybrid ARM configuration.
The following is a simple hardware design written in VHDL and simulated in ModelSim. The image contains the waveform output of a simulation in ModelSim. The HPS is not used. On boot, the bitstream is loaded into the FPGA fabric. VHDL components are utilized, and a testbench is defined for testing the design. The entity and architecture VHDL design units are below.
- --three input xnor gate entity declaration - external interface to design entity
entity xnorgate is
port (
a,b,c : in std_logic;
q : out std_logic);
end xnorgate;
architecture xng of xnorgate is
begin
q <= a xnor b xnor c;
end xng;
- --chain of xor / xnor gates using components and sequential logic
entity xorchain is
port (
A,B,C,D,E,F : in std_logic;
Av,Bv : in std_logic_vector(31 downto 0);
CLOCK_50 : in std_logic;
Q : out std_logic;
Qv : out std_logic_vector(31 downto 0));
end xorchain;
architecture rtl of xorchain is
component xorgate is
port (
a,b : in std_logic;
q : out std_logic);
end component;
component xnorgate is
port (
a,b,c : in std_logic;
q : out std_logic);
end component;
component xorsgate is
port (
av : in std_logic_vector(31 downto 0);
bv : in std_logic_vector(31 downto 0);
qv : out std_logic_vector(31 downto 0));
end component;
signal a_in, b_in, c_in, d_in, e_in, f_in : std_logic;
signal av_in, bv_in : std_logic_vector(31 downto 0);
signal conn1, conn2, conn3 : std_logic;
begin
xorgt1 : xorgate port map(a => a_in, b => b_in, q => conn1);
xorgt2 : xorgate port map(a => c_in, b => d_in, q => conn2);
xorgt3 : xorgate port map(a => e_in, b => f_in, q => conn3);
xnorgt1 : xnorgate port map(conn1, conn2, conn3, Q);
xorsgt1 : xorsgate port map(av => av_in, bv => bv_in, qv => Qv);
process(CLOCK_50)
begin
if rising_edge(CLOCK_50) then --assign inputs on rising clock edge
a_in <= A;
b_in <= B;
c_in <= C;
d_in <= D;
e_in <= E;
f_in <= F;
av_in(31 downto 0) <= Av(31 downto 0);
bv_in(31 downto 0) <= Bv(31 downto 0);
end if;
end process;
end rtl;
entity xorchain_tb is
end xorchain_tb;
architecture xorchain_tb_arch of xorchain_tb is
signal A_in,B_in,C_in,D_in,E_in,F_in : std_logic := '0';
signal Av_in : std_logic_vector(31 downto 0);
signal Bv_in : std_logic_vector(31 downto 0);
signal CLOCK_50_in : std_logic;
signal BRK : boolean := FALSE;
signal Q_out : std_logic;
signal Qv_out : std_logic_vector(31 downto 0);
component xorchain
port (
A,B,C,D,E,F : in std_logic;
Av : in std_logic_vector(31 downto 0);
Bv : in std_logic_vector(31 downto 0);
CLOCK_50 : in std_logic;
Q : out std_logic;
Qv : out std_logic_vector(31 downto 0));
end component;
begin
xorchain_instance: xorchain port map (A => A_in,B => B_in, C => C_in,
D => D_in, E => E_in, F => F_in, Av => Av_in,
Bv => Bv_in, CLOCK_50 => CLOCK_50_in, Q => Q_out,
Qv => Qv_out);
clockprocess: process
begin
while not BRK loop
CLOCK_50_in <= '0';
wait for 20 ns;
CLOCK_50_in <= '1';
wait for 20 ns;
end loop;
wait;
end process clockprocess;
testprocess : process
begin
A_in <= '1';
B_in <= '0';
C_in <= '1';
D_in <= '0';
E_in <= '1';
F_in <= '1';
wait for 40 ns;
A_in <= '1';
B_in <= '0';
C_in <= '1';
D_in <= '0';
E_in <= '1';
F_in <= '0';
wait for 20 ns;
A_in <= '0';
B_in <= '0';
C_in <= '1';
D_in <= '0';
E_in <= '1';
F_in <= '0';
wait for 40 ns;
BRK <= TRUE;
wait;
end process testprocess;
end xorchain_tb_arch;
entity xorgate is
port (
a,b : in std_logic;
q : out std_logic);
end xorgate;
architecture xg of xorgate is
begin
q <= a xor b;
end xg;
entity xorsgate is
port (
av : in std_logic_vector(31 downto 0);
bv : in std_logic_vector(31 downto 0);
qv : out std_logic_vector(31 downto 0));
end xorsgate;
architecture xsg of xorsgate is
begin
qv <= av xor bv;
end xsg;