Saturday, September 13, 2025

gRPConf 2025

Here is an AI summary of the talks from the gRPC conference which has been posted on Youtube. Here is the link to the keynote: https://www.youtube.com/watch?v=OO5w__uDsNc

Core gRPC Concepts and Future Direction

  • Welcome and Key Updates: The conference kicked off by celebrating gRPC's 11th year of growth, highlighting significant community engagement, impressive download statistics (e.g., 43 million weekly for Python), and upcoming features like early access support for Rust. A key goal announced is the project's pursuit of CNCF "Graduated" status, signifying the highest level of maturity.

  • gRPC: A Decade of Innovation: This talk celebrated gRPC's 10th anniversary as an open-source project, noting its rapid adoption by companies like Netflix, Spotify, and LinkedIn. Its success is attributed to high performance via Protocol Buffers and HTTP/2, language agnosticism, and features that enable reliable microservices. Future directions focus on cloud-native enhancements like proxyless service mesh and its expanding role in AI.

  • gRPC's Second Decade Roadmap: The project's future is driven by three pillars: enhancing proxyless service mesh (with features like ExtProc for request modification and ExtAuth for authorization), improving observability (with Channel Z v2 and new OpenTelemetry metrics), and modernization (including official Rust support and first-class integration for serverless and AI protocols).

  • gRPC: Core Concepts and Lifecycle: A foundational overview explained gRPC as a high-performance RPC framework built on Protocol Buffers and HTTP/2. The talk detailed the lifecycle of a call, from channel creation and name resolution to load balancing, and covered advanced features like interceptors, deadlines, cancellation, and automatic retries for building resilient applications.

  • Protobuff Editions: A new framework called Protobuff Editions was introduced to allow for the controlled evolution of the Protocol Buffers language without disruptive, backward-incompatible changes. Inspired by Rust Editions, it enables users to incrementally adopt new features while ensuring stability and providing clear migration paths.

gRPC in Production: Case Studies and Best Practices

  • Netflix - Handling Traffic Spikes: Netflix detailed its system for managing severe traffic spikes using "automated prioritized load shedding." By measuring per-RPC latency and identifying request criticality, the system uses gRPC interceptors and Envoy filters to shed less important traffic, ensuring essential services remain available during extreme load without needing to increase server capacity.

  • Netflix - Managing Contextual Data: A second talk from Netflix explained how they propagate cross-cutting metadata (for chaos engineering, A/B testing, and resiliency) across their vast microservices architecture. They use Protobuf to define the data and custom gRPC interceptors to efficiently transport this context via request headers and response trailers, with robust observability to prevent data loss.

  • Mastercard - Evolving Critical Financial Systems: A panel from Mastercard discussed their adoption of gRPC bidirectional streaming for systems that process billions of transactions. They highlighted benefits in performance and security but also transparently covered challenges in achieving deep observability, adapting to client-side load balancing for persistent connections, and performing rigorous security validation.

  • Reddit - Scalable Service Discovery: Reddit presented their journey from using Kubernetes' default DNS to a sophisticated proxyless gRPC XDS-based system for service discovery. To manage their massive, multi-region scale, they built a custom control plane with a dynamic configuration injector and a validating webhook to protect the system, complemented by extensive client-side observability tools.

  • Apple - Containerization with gRPC Swift: Apple showcased its open-source framework that uses gRPC Swift to run Linux containers inside lightweight, secure virtual machines. A gRPC server running within each VM acts as an agent to orchestrate low-level configurations like process startup and filesystem mounts, communicating with the macOS host over a Virtio socket.

  • LinkedIn - Optimizing for High Performance: Engineers from LinkedIn shared advanced techniques for tuning gRPC services. The talk focused on best practices for managing deadlines in distributed systems to prevent cascading failures, using request batching to improve throughput for CPU-bound workloads, and correctly configuring keep-alives to ensure connection stability with network proxies and load balancers.

AI, Cloud, and Kubernetes Integration

  • gRPC in AI Tooling Stacks: This session explored the critical role of gRPC in production-ready AI products. Its speed, efficient streaming, and instant cancellation capabilities are ideal for meeting user expectations of responsive and interactive AI, directly addressing performance pain points like token latency and slow mid-response API calls. At scale, gRPC can be 7-10 times faster than REST for server-to-server communication.

  • Integrating gRPC with Model Context Protocol (MCP): Presenters argued for leveraging gRPC to overcome the limitations of MCP, an emerging protocol for connecting LLMs to external tools. gRPC's proven performance, security, and built-in streaming can provide a more robust and scalable transport for MCP's ad-hoc design.

  • Building an E-commerce Site on Google Cloud: A case study of a fictional company, "Gshu," demonstrated migrating an on-premise application to a globally scalable infrastructure on Google Cloud. The journey covered leveraging managed services for compute (Cloud Run, GKE), databases (Cloud SQL, Spanner), data processing (Dataflow), and advanced machine learning with Vertex AI.

  • Resolving Incidents with Gemini Cloud Assist: A practical demonstration showed how Google's Gemini Cloud Assist can rapidly resolve a critical website outage. The AI-powered tool analyzed system logs, identified the root cause of a changed IP address on a VM, and provided the precise command-line fix, turning a complex troubleshooting task into a guided, minutes-long solution.

  • Kubernetes and GKE from Zero: This introductory talk provided a foundational understanding of Kubernetes and Google Kubernetes Engine (GKE). It explained the concepts from the perspectives of a software developer and a platform team, highlighting how containers and Kubernetes bridge the gap between development and operations by simplifying scaling and software rollouts.

Advanced Topics and Ecosystem Tools

  • Service Meshes and gRPC: This talk explored the interaction between gRPC and various service mesh architectures (sidecar, proxyless, etc.). It emphasized that for gRPC's HTTP/2 traffic, application-aware (L7) load balancing is crucial, and proxyless architectures are ideal for performance-sensitive, gRPC-heavy environments as they eliminate proxy-induced latency.

  • Load Balancing in gRPC: A deep dive into gRPC's native client-side load balancing architecture explained how traffic is distributed across server backends. The session detailed the roles of the Load Balancing Policy, name resolvers, and sub-channels, and provided an overview of the API for creating custom LB policies.

  • gRPC Observability Updates: This presentation focused on recent advancements in gRPC observability, particularly the enhanced integration with Open Telemetry for tracing and metrics. It also covered a suite of diagnostic tools, including gRPC Binary Logging for replaying RPCs, gRPCurl for command-line interaction, and ChannelZ for inspecting internal channel states.

  • gRPC API Management with Kubernetes: WSO2 presented its Kubernetes Gateway, built on Envoy, designed to decouple management concerns like security, rate limiting, and traffic governance from gRPC applications. The solution allows platform teams to enforce policies without requiring changes to the application code.

  • Bringing HTTP/3 to gRPC at Cloudflare: Cloudflare engineers discussed their implementation of QUIC and HTTP/3 to enhance gRPC at scale. HTTP/3 provides inherent performance and security benefits over HTTP/2, including immunity to certain denial-of-service attacks and reduced head-of-line blocking. Preliminary data showed significant latency reductions with the new protocol.

  • gRPC Rust Update: The gRPC team provided an update on the development of official support for the Rust language. The strategy is to build upon the popular community library, Tonic, by integrating full gRPC features like advanced load balancing and client-side health checking, with a beta release planned for late this year.

  • gRPC's Journey to CNCF Graduation: This talk outlined the strategic overhaul of gRPC's governance structure to meet the requirements for becoming a "Graduated" project within the CNCF. Key changes included implementing a formal contributor ladder and establishing an elected Steering Committee to guide the project's high-level direction.

Sunday, September 17, 2023

How Does Qualcomm's Fingerprint Sensor Work?

Introduction

I recently came across a demonstration of the Samsung S22's fingerprint sensor while browsing used phones. Although it was challenging to find accurate information on how this technology works, I did eventually uncover details about its origins. This article outlines what I've learned.

My Motivation


Having previously worked on medical ultrasound imaging, I find the potential of a mass-produced 2D array of ultrasonic transducers particularly intriguing. Such an array could revolutionize low-cost 3D ultrasound imaging.
 

Link to the Initial Research on the Technology


Around 2015, Bernhard E. Boser's research group was instrumental in developing the piezoelectric micromachined transducer (PMUT). Yipeng Lu, one of the authors of the seminal 2016 paper [1] on this topic, joined Qualcomm as a senior engineer. Recent patents by Qualcomm include contributions from Kostadin Dimitrov Djordjev, Jessica Liu Strohmann, and Nicholas Ian Buchan, but this article focuses solely on information from the 2016 paper.

Additional evidence supporting the presence of PMUT devices in Qualcomm's 3D Sonic comes from a report by System Plus Consulting [2]. This report specifically mentions the use of PMUT technology. Furthermore, a YouTube video [3] provides a visual demonstration, showing the kinds of images achievable with a larger version of the sensor:

Construction of the Sensor


The PMUTs are MEMS devices bonded to a CMOS ASIC equipped with high-voltage (24V) transistors.

How Does the piezoelectric micromachined transducer  (PMUT) Work?


The core of a PMUT is an Aluminum Nitride (AlN) layer sandwiched between electrodes. During transmission, an applied voltage causes the AlN membrane to buckle, emitting an ultrasonic wave. When in reception mode, incoming pressure waves induce a charge across the transducer. The front side of the PMUT is coupled to the finger through a 250µm layer of silicone, while the backside is in a vacuum to prevent emission or losses towards the back.

Array Utilization for Imaging


The sensor comprises an array of 110 rows x 56 columns of PMUTs, arranged at a 43µm x 58µm pitch. This array achieves approximately 500 dpi resolution. Column-wise sequential readout is enabled by 56 analog demodulators. Transmit signals, created by exciting five adjacent columns, are time-delayed to focus the ultrasonic beam. The received signals are then processed to construct the image.

The algorithm employed for image construction is quite rudimentary. It essentially captures the envelope of the received ultrasonic signal at a predetermined time instant, without the use of more sophisticated image reconstruction techniques like beamforming or Fourier-based methods.

Conclusion

Pros

Qualcomm's ultrasonic fingerprint sensor stands as a remarkable engineering achievement. Not only can it capture high-resolution fingerprint images swiftly, but it also excels in challenging conditions like wet or oily surfaces and in complete darkness, thanks to its ultrasonic imaging capabilities. Moreover, its energy-efficient design is so optimized that the sensor can double as a power switch for the device, further enhancing its practicality.

Security Considerations

While Qualcomm claims the 3D imaging capabilities of their sensor enhance biometric security, I have some skepticism, particularly concerning the depth imaging. The employed reconstruction algorithm is fairly basic, which brings into question the sensor's resistance to spoofing techniques. This skepticism seems to be supported by existing evidence. Indeed, a brief search led to a report detailing a successful spoofing attack on a similar ultrasonic fingerprint sensor using a 3D-printed finger [4]. Given these factors, the actual security efficacy of Qualcomm's sensor remains an open question.

Future Applications and Hackability

Given its high-volume production and intricate engineering aimed at a specific problem, Qualcomm's sensor is a compelling candidate for other innovative applications. While the sensor is currently designed for fingerprint scanning, one could envision hacking the device to explore further into biological tissues, potentially repurposing it as an ultrasonic tomograph.

The present architecture, if sufficiently modified, could provide much-needed capabilities for 3D imaging in medical applications. Currently, the sensor focuses on a shallow layer of the skin to read fingerprints. However, with alterations in the excitation pulses and the readout strategy, the sensor might be capable of imaging deeper tissue structures.

To turn this into a reality, the sensor's readout would need to be modified to sample each PMUT at higher frequencies, enabling more sophisticated beamforming techniques for imaging. The challenge here would be in the high-speed, high-resolution data acquisition that proper beamforming would require.

Admittedly, such a modification would not be straightforward. These sensors are highly optimized to serve their primary function of fingerprint scanning. Yet, the concept of repurposing a mass-produced, sophisticated piece of hardware like this for medical imaging or other scientific applications is tantalizing. It would certainly require a deep dive into the sensor's architecture and capabilities, but the payoff could be monumental, offering a low-cost solution for more intricate imaging needs.

References

[1] Tang, Hao-Yen, et al. "3-D ultrasonic fingerprint sensor-on-a-chip." IEEE Journal of Solid-State Circuits 51.11 (2016): 2522-2533. https://ieeexplore.ieee.org/abstract/document/7579196
[2] System Plus Consulting Report https://s3.i-micronews.com/uploads/2019/07/SP19465-YOLE_Qualcomm-3D-Sonic-Sensor-Fingerprint_Sample.pdf
[3] YouTube Video https://youtu.be/JeTm5sd8ktg?t=143
[4] Spoofing attack https://www.digitalinformationworld.com/2019/04/samsung-galaxy-s10-ultrasonic-sensor-fingerprint.html

Time-Gating for 7 Frames with a Digital Micromirror Device

Introduction  

The paper titled "Diffraction-gated real-time ultrahigh-speed mapping photography" ([2]) presents a method for capturing seven distinct frames at an ultra-fast speed.

Method  

The study employs a Digital Micromirror Device (DMD), a MEMS-based technology featuring an array of tiny mirrors that can toggle between two orientations at high speeds. Placed in the back focal plane of an imaging system, the DMD produces seven diffraction orders, essentially creating seven copies of the input image.

All mirrors in the DMD array tilt synchronously. During transitions from an "all-off" to an "all-on" state, this synchronous tilt yields a dynamic phase profile. This results in the diffraction envelope sweeping through the seven diffraction orders.

Equation 1 outlines the DMD's diffraction pattern:




This equation describes the intensity of a point target observed in the intermediate image plane. Two sinc functions modulate the diffraction orders: one depends on the width of individual mirrors and the other on the width of the device. The time-dependent element here is the instantaneous tilt angle θb of the mirrors.

Note: A potential error could be the cos(mπ) term located outside the sum over m.

Performance metrics indicate a frame interval of 0.21 us ± 0.03 us, corresponding to 4.8 million frames per second.

Applications  

While capturing just seven consecutive frames may seem limited, there are intriguing applications. Given the short integration time, the method requires either high light power or multiple cycles of the DMD mirrors. Potential applications include:

  • Analyzing the ablation patterns of pulsed lasers
  • Investigating the spatial distribution of light modes in a moving multi-mode fiber
  • Studying transient changes in photonic integrated chips, such as superluminescent diodes or lasers with modulated gain sections

References  

- [1] AJD-4500 DMD Controller https://ajile.ca/ajd-4500/
- [2] Optica Paper https://opg.optica.org/optica/fulltext.cfm?uri=optica-10-9-1223&id=538060
- [3] Hacker News Discussion https://news.ycombinator.com/item?id=37523314

Sunday, September 6, 2020

Display copernicus sentinel 1 space packet headers as a table with gtkmm


I am working on a project to visualize the raw radar data of the Copernicus Sentinel 1 satellite.

The data comes in a binary format with header information. Each row in the following figure contains header information of one "space packet":

This source code parses the header information and renders the data as a table in a GTK TreeView:

 https://github.com/plops/cl-cpp-generator2/blob/b13e3afbb30a5958f8e2a6d45b35523f02d32fed/example/33_copernicus_gtk/source/vis_00_base.cpp

The column titles are quite long. In order to fit the table better on the screen I shorten the strings in the table header. The full column title is displayed as a tooltip.

 

I find it surprisingly difficult to express this in gtkmm.

 

Next I want to plot some of the data. I found a dataset that was acquired in stripmap mode and contains several good point spread functions (ships in water).



Sunday, August 16, 2020

Google Protocol Buffers for serial communication with a microcontroller


For a while now I was thinking about implementing a protobuf based binary protocol in the firmware of a microcontroller.

This would have the following advantages:

proto files as specification
parser generators for many languages (python, c)
versioning/backward compatibility
speed up transfer


As a proof of concept I implemented a single message that the microcontroller will send to the host computer using the following packet.
The packet starts with 5 "U" characters and ends with 5 "0xff" bytes:

0x55 0x55 0x55 0x55 0x55 <len_lsb> <len_msb> <payload_bytes...> 0xff 0xff 0xff 0xff 0xff


This preamble looks good in the logic analyzer and I think I can eventually use it for baudrate estimation.


A 16 bit packet length is included and the payload data is encoded using a google protocol buffer file:
https://github.com/plops/cl-cpp-generator2/blob/master/example/29_stm32nucleo/source/simple.proto


syntax = "proto2";
import "nanopb.proto";
message SimpleMessage {
required uint32 id = 1;
required uint32 timestamp = 2;
required uint32 phase = 3;
repeated int32 int32value = 4;
required int32 sample00 = 5;
...
required int32 sample59 = 64;
};





The packet encoder on the MCU is using https://github.com/nanopb/nanopb. It generates encoders and parsers that are compatible with google protocol buffers but the code is C instead of C++ and the code size is small.


https://github.com/plops/cl-cpp-generator2/blob/master/example/29_stm32nucleo/source/boilerplate/main.c



The packet is generated around line 524 (search for "SimpleMessage").


Initially I couldn't figure out how to fill a variable length array. That is why have samle00..sample59 variables. Later I learned how to store an array in int32value line 531, using callback the callback encode_int32.


The python code
https://github.com/plops/cl-cpp-generator2/blob/master/example/29_stm32nucleo/source2/run_00_uart.py

feeds all received bytes into a finite state machine that searches for the five "U" character preamble, parses the packet length and decodes the payload data with code that is autogenerated by google protobuf from simple.proto.

 

Received packets look like this after decoding:


id: 1431655765
timestamp: 4281899953 
phase: 16
int32value: 42
int32value: 43
int32value: 44
int32value: 45
int32value: 46
int32value: 47
int32value: 48
int32value: 49
int32value: 50
int32value: 51
sample00: 193
sample01: 193
...
sample52: 206
sample53: 1999
sample54: 1776
sample55: 300
sample56: 205
sample57: 196
sample58: 193
sample59: 190






The MCU code itself solves a toy problem. I generate output like this:
4095 0 0 4095 0 0 0 4095 0 0 0 ...
on the DAC and read it back with the ADC. Each sample of the ADC is integrated for 2.5cycles of the 80MHz system clock


I shift the ADC trigger relative to the DAC output clock with a PWM timer.


My goal was to investigate the influence of the DAC output buffer.



With buffer the ADC samples do not change much when the ADC acquires the data with a delay after the DAC has settled:

 
 
 
Without DAC buffer the ADC signal gets lower the longer the duration between DAC output instance and ADC acquisition trigger.
 

Sunday, July 12, 2020

ARPACK sparse eigenvalues and GPU

I am trying to learn how to find eigenvalues and eigenvectors of a sparse matrix. A good example problem are electronic orbitals of the hydrogen atom. This paper gives a good introduction:


https://www.mdpi.com/2218-2004/6/2/22/pdf
The structure of the matrix is like this:


A CUDA kernel that implements the matrix vector product looks like this: 

I use ARPACK++ with its reverse communication interface to compute the matrix vector products on the an RTX 2060 GPU.
 The results are similar to the benchmark results listed in the paper:
The radial component of the wavefunction of the ground state looks like this:


The source code is available on github:
 Note that the C++ and CUDA code is generated from Common Lisp source https://github.com/plops/cl-cpp-generator2/blob/master/example/27_sparse_eigen_hydrogen/gen00.lisp.

Sunday, May 5, 2019

Numerical derivative of analytic functions

I  recently learned about a neat trick to compute the derivative of an analytic function.

Others [1] have described it better than I could. So I just keep this here as a reminder.


[1] https://blogs.mathworks.com/cleve/2013/10/14/complex-step-differentiation/