Skip to main content

28 posts tagged with "Engineering"

View All Tags

· 7 min read
Jeffrey Chupp

Implementing a language server is so easy that we're going to do it in Bash.

You really shouldn't write your language server in Bash, but we'll show that you could.

The minimum viable language server needs to

  1. receive JSON-RPC requests from the client
  2. respond with JSON-RPC to client requests

Like a good bash program, we'll talk over stdin and stdout.

Mad scientist holding a bash language server

JSON-RPC

JSON-RPC is simply a protocol for communication over JSON.

  • A request message includes an id, params, and the method to be invoked.
  • A response message includes the id from the request and either a result or error payload.
  • The LSP adds on the additional requirement of a header specifying the Content-Length of the message.

An example request might look like

Content-Length: 226\r\n
\r\n
{"jsonrpc":"2.0","method":"initialize","id":1,"params":{"trace":"off","processId":2729,"capabilities":[],"workspaceFolders":null,"rootUri":null,"rootPath":null,"clientInfo":{"version":"0.10.0-dev+Homebrew","name":"Neovim"}}}

An example response might look like

Content-Length: 114\r\n
\r\n
{
"jsonrpc": "2.0",
"id": 1,
"result": {
"capabilities": {
"completionProvider": {}
}
}
}

For our language server in bash, we'll write the following function:

respond() {
local body="$1"
local length=${#body}
local response="Content-Length: $length\r\n\r\n$body"

echo -e "$response"
}

This will take a JSON string as an argument and echo it out. The -e ensures our line breaks work as intended.

Listening for messages and parsing them

Our language server will listen for messages on stdin and write messages on stdout.

Let's name the bash script /tmp/bash-ls and chmod +x it.

I'll connect it to my editor, Neovim, using

vim.lsp.start {
name = "Bash LS",
cmd = {"/tmp/bash-ls"},
capabilities = vim.lsp.protocol.make_client_capabilities(),
}

Now, we'll work on our read/print loop.

We'll start with the Bash classic

while IFS= read -r line; do

This gives us a value for $line that looks like Content-Length: 3386

The content length will vary based on the capabilities of your editor, but the gist here is that we need to read 3386 characters to get the entire JSON payload.

Let's extract the content length number

while IFS= read -r line; do
# Capture the content-length header value
[[ "$line" =~ ^Content-Length:\ ([0-9]+) ]]
length="${BASH_REMATCH[1]}"

We need to add 2 to the number to account for the initial \r after the content length header. So we'll length=$((length + 2))

Now we're ready to read the JSON payload:

while IFS= read -r line; do
# Capture the content-length header value
[[ "$line" =~ ^Content-Length:\ ([0-9]+) ]]
length="${BASH_REMATCH[1]}"

# account for \r at end of header
length=$((length + 2))

# Read the message based on the Content-Length value
json_payload=$(head -c "$length")

Remember that JSON-RPC requires us to include the id of the request message in our response. We could write some convoluted JSON parsing in bash to extract the id, but we'll lean on jq instead.

while IFS= read -r line; do
# Capture the content-length header value
[[ "$line" =~ ^Content-Length:\ ([0-9]+) ]]
length="${BASH_REMATCH[1]}"

# account for \r at end of header
length=$((length + 2))

# Read the message based on the Content-Length value
json_payload=$(head -c "$length")

# We need -E here because jq fails on newline chars -- https://github.com/jqlang/jq/issues/1049
id=$(echo -E "$json_payload" | jq -r '.id')

Now, we have everything we need to read and reply to our first message.

The initialize method

The first message sent by the client is the initialize method. It describes the client's capabilities to the server.

You can think of this message as saying, "Hey, language server, here are all the features I support!"

The server replies with, "Oh, hi, client. Given the things you support, here are the things I know how to handle."

Well, that's how it should work, anyway. For our MVP here, we'll provide a canned response with an empty capabilities section.

while IFS= read -r line; do
# Capture the content-length header value
[[ "$line" =~ ^Content-Length:\ ([0-9]+) ]]
length="${BASH_REMATCH[1]}"

# account for \r at end of header
length=$((length + 2))

# Read the message based on the Content-Length value
json_payload=$(head -c "$length")

# We need -E here because jq fails on newline chars -- https://github.com/jqlang/jq/issues/1049
id=$(echo -E "$json_payload" | jq -r '.id')
method=$(echo -E "$json_payload" | jq -r '.method')

case "$method" in
'initialize')
respond '{
"jsonrpc": "2.0",
"id": '"$id"',
"result": {
"capabilities": {}
}
}'
;;

*) ;;
esac
done

We pluck out the request's method and use a case statement to reply to the correct method. If we don't support the method, we don't respond to the client.

If we didn't use a case statement here and always replied with our canned message, we'd make it past initialization, but then the client would get confused when we respond to (e.g.) its request for text completions with an initialize result.

That's all you need for a minimum viable language server built-in bash. It doesn't do anything besides the initialization handshake, but it works.

Adding completion

A language server that doesn't do anything is no fun, so let's teach it how to respond to textDocument/completion to offer text completions.

First, we'll need to modify our capabilities in our initialize response to indicate that we support completion:

          "result": {
"capabilities": {
"completionProvider": {}
}
}

We'll start with hardcoded results to verify things work. This is as easy as adding a new condition to our case statement.

  'textDocument/completion')
respond '{
"jsonrpc": "2.0",
"id": '"$id"',
"result": {
"isIncomplete": false,
"items": [
{ "label": "a sample completion item" },
{ "label": "another sample completion item" }
]
}
}'
;;
Hardcoded completions

That works as we hoped. Let's jazz it up a little by completing the first 1000 words from the dict file on macOS (your path may differ).

Here's the final version of the script:

#!/bin/bash

respond() {
local body="$1"
local length=${#body}
local response="Content-Length: $length\r\n\r\n$body"

echo "$response" >>/tmp/out.log

echo -e "$response"
}

completions=$(head </usr/share/dict/words -n 1000 | jq --raw-input --slurp 'split("\n")[:-1] | map({ label: . })')

while IFS= read -r line; do
# Capture the content-length header value
[[ "$line" =~ ^Content-Length:\ ([0-9]+) ]]
length="${BASH_REMATCH[1]}"

# account for \r at end of header
length=$((length + 2))

# Read the message based on the Content-Length value
json_payload=$(head -c "$length")

# We need -E here because jq fails on newline chars -- https://github.com/jqlang/jq/issues/1049
id=$(echo -E "$json_payload" | jq -r '.id')
method=$(echo -E "$json_payload" | jq -r '.method')

case "$method" in
'initialize')
respond '{
"jsonrpc": "2.0",
"id": '"$id"',
"result": {
"capabilities": {
"completionProvider": {}
}
}
}'
;;

'textDocument/completion')
respond '{
"jsonrpc": "2.0",
"id": '"$id"',
"result": {
"isIncomplete": false,
"items": '"$completions"'
}
}'
;;

*) ;;
esac
done
Dictionary completions

Lovely.

Closing

In 56 lines of bash, we've implemented a usable (if boring) language server.

I wouldn't advocate anyone writing a serious language server in bash. Hopefully, this has illustrated how easy it is to get started with language servers and has made the LSP and JSON-RPC a little less magical.

What language would I recommend for writing a language server? That's probably a whole article in itself, but the short answer is

  1. All things being equal, choose TypeScript. The first-party libraries (e.g., vscode-languageserver-node) are in written TypeScript, and the community and ecosystem are excellent.
  2. If you don't want to use TypeScript, use whatever language you're most productive in. There's probably already a library for writing a language server in your preferred language, but if there isn't, you now know how easy it is to write it yourself.

If you'd like to be notified when I publish more LSP content, sign up for my newsletter.

· 5 min read
Andrew Yip

This post will walk you through setting up Prefab feature flags in your React app, and creating a flag for a gradual rollout of a new feature.

Creating a feature flag in the Prefab dashboard

If you don't have one already, start by creating a free Prefab account. Once you sign in, this is what you'll see

Prefab dashboard

To create your first feature flag, click on Flags in the left nav and you'll be brought to the Flags index page.

Prefab new feature flag page

Click Add Flag to create your first flag. We'll name it flag.my-new-feature and leave the type as bool, then click Save.

Prefab feature flag settings

Once you Save the flag name, you'll see the flag Variants and flag Rules. Since this is a boolean flag, the variants aren't editable, but for other flag types such as string you can name as many variants as you want.

Prefab feature flag rules

Now let's add a rule under the Development environment:

  1. Click Add Rule
  2. Change the variant of the new rule to true
  3. Enter user.email in the property field
  4. Change the logic operator to ends with one of
  5. Enter @prefab.cloud in the values field
  6. Click Save to save your new rule
info

Prefab creates the Development, Staging, and Production environments automatically, but you can edit, add, or delete environments to meet your needs.

Prefab feature flag rule

There's one more step. You'll need to click Publish Changes to make your new rule live.

Prefab feature flag publish

That's it! Your new flag is ready to use, but there's one more step before we can write some coe.

Creating an API key

You need to create an API key before you can access Prefab from your code. Click on Environments in the left nav. API Keys belong to environments.

Prefab environments page

Since we're using React, you'll want to create a Client key. You can learn more about client vs. server keys in our docs. Click Add Client API Key under the Development environment and make sure you write down the key. It will only be displayed once.

Setting up Prefab in your React app

Install the latest version

Use your favorite package manager to install @prefab-cloud/prefab-cloud-react npm | github

npm install @prefab-cloud/prefab-cloud-react

Initialize the client

Wrap your component tree in the PrefabProvider, e.g. This will enable you to use our hooks when you want to evaluate a feature flag.

import { PrefabProvider } from "@prefab-cloud/prefab-cloud-react";

const WrappedApp = () => {
const onError = (reason) => {
console.error(reason);
};

return (
<PrefabProvider apiKey={"YOUR_CLIENT_API_KEY"} onError={onError}>
<MyApp />
</PrefabProvider>
);
};

Using hooks to evaluate flags

Now use the usePrefab hook to fetch flags. isEnabled is a convenience method for boolean flags.

const NewFeatureButton = () => {
const { isEnabled } = usePrefab();

if (isEnabled("flag.my-new-feature")) {
return <button>Do the thing!</button>;
}

return null;
};
info

Prefab gives you a single source of truth for feature flags, so your backend and frontend code are always in sync about which variant to assign.

See the result

If we render the NewFeatureButton component, we see nothing. Uh oh, is our flag working? Well if you remember the rules we setup, the default value of the flag is false. It will only be true if the user's email ends with @prefab.cloud, but so far we haven't told Prefab anything about the email address. This is where Context comes in.

Add context to match the Prefab rule

Context is a powerful mechanism that lets you supply metadata about users, teams, devices, or any other entity that's important. You can then use this data for rule targeting. Let's add some context. Usually you'll want to define context once when you setup PrefabProvider.

import { PrefabProvider } from "@prefab-cloud/prefab-cloud-react";

const WrappedApp = () => {
const contextAttributes = {
user: { email: "me@prefab.cloud" },
};

const onError = (reason) => {
console.error(reason);
};

return (
<PrefabProvider
apiKey={"YOUR_CLIENT_API_KEY"}
contextAttributes={contextAttributes}
onError={onError}
>
<App />
</PrefabProvider>
);
};

Now try rendering NewFeatureButton again. You should see our button!

Making a rollout for Production

Let's try one more thing. Suppose you want to gradually release this feature to your users. Prefab supports rollouts that will allocate users in a percentage split (also great for A/B testing!).

Go back to our feature flag in the Prefab dashboared, and click Edit Rules.

Prefab saved rule

Click on the Production tab, then click on the current variant (false) and change it to Rollout.

Prefab create rollout

Here you can set the percentage of users that should receive each variant. You also need to choose a sticky property, which is a context property that will be used to consistently assign a given user to a variant. In this case we can enter user.email since that will be unique and we're already providing it as context. However, if you have a user tracking ID that is often the best choice.

Testing

What about unit testing? Prefab supplies a PrefabTestProvider component that you can use to force specific variants when setting up tests. You can learn more about testing in our docs.

· 4 min read
James Kebinger
feature flag by cloud host

Prefab’s feature flag evaluation charts let you see the distribution of feature flag evaluations over time - nice to see when those configurations have changed or to see automatic rollouts proceed.

How Does It Work (With Date Function)

  1. The clients send telemetry to Prefab HQ containing evaluation counts in short time windows (5-15 seconds), which we store as-is in a (GoogleSQL-dialect) spanner table [^1]. Each row contains start_at and end_at timestamps plus counts and various data about the evaluation.
  2. At query time, we truncate each row’s start_at timestamp to the start of the hour (or day, week) in the user’s timezone using the timestamp_trunc method [^2] and sum the counts, grouping by the truncated timezone like this (simplified) query
SELECT 
timestamp_trunc(start_at, HOUR, ‘America/Chicago),
config_id,

SUM(count)
FROM table_name
GROUP BY 1,2
  1. That gives us all the time intervals that have data - but will completely exclude any time intervals that don’t have any rows in them.
tip

The front-end charting library doesn’t naturally handle sparse data, so to be a nice API, we should fill in the zero intervals. In databases with series-generation functions we can generate all the time intervals and left join the data [^3]. Spanner doesn’t have that so we build the complete set of time intervals in the API code [^4] and then do the equivalent of a LEFT JOIN there.

Again, Without In-Database Date Functions

Faced with the prospect of no date/timestamp truncation method in Postgres-dialect Spanner, I switched this use-case to use GoogleSQL dialect. [^5]. A few days later when I was solving the missing time interval problem (see above) I had a lightbulb moment where I saw how I could do the same truncation without timestamp_truncand friends, and potentially more flexibly.

Starting with the assumption that all the rows containing data have a start_at column in UTC milliseconds:

  1. Use a date library in your application to convert the user’s selected start time into a UTC timestamp (call this chart_start_at)

  2. Calculate milliseconds in each selected time period (eg 1 hour = 3,600,000 mills) - this is our interval_ms.

  3. Subtract chart_start_at from every start_at and then use integer division to calculate integral multiples of offset from the start time. start_at - chart_start_at / interval_ms

  4. Group and sum the data by those integral offsets. In GoogleSQL-dialect Spanner that’ll look like this (where 1692907200000 is the chart_start_at and 3600000 is an hour

    SELECT 
    SELECT DIV((start_at-1692907200000),3600000) as start_at_interval_muliples,
    config_id,

    SUM(count)
    FROM table_name
    WHERE start_at >= 1692907200000 ...
    GROUP BY 1,2
  5. This will generate a series of rows with multiples of the offsets since the start time. You’ll need to reconstitute those to real timestamps at some point, either in SQL or your application. To do that just multiply by the offset and add the start time again (start_at - chart_start_at / interval_ms) * interval_ms + chart_start_at

One nice thing about this approach is its easy to customize the interval time beyond what may be possible using an in-database function - want to rollup in database to 4 hour intervals? Go for it!

Notes:

[^1] One day it’ll make sense to periodically pre-roll up this data to eg 15 minute intervals but we done that yet. Note 15 minutes is the largest practical pre-rolled up data interval because there are timezones with hour+-15 minute offsets from UTC

[^2] Timestamp_trunc spanner docs

[^3] I used and wrote about this approach a couple of jobs ago

[^4] Our time span generator

[^5] Not the only reason! See Friends Don’t Let Friends Use PostgreSQL Dialect Google Spanner

· 2 min read
James Kebinger
Getting Headers in the Right Place

CDNs are great, but they can be a bit of a black box. What if you want to see your CDN traffic broken down by customer? The CDN has that information, but how can we get it out?

Fastly’s CDN hosts our customer’s configuration data so that it is available even if Prefab’s API were to be unavailable. In order to understand our CDN traffic broken down by customer, we set out to use Fastly’s logging product to stream data back to us containing information from our API response but hit a bump in understanding the varnish/Fastly object model.

First Problem: Customer Identification

The default Fastly Log format doesn’t contain a lot of data. To make downstream computation more straightforward, we set out to add a couple of headers to our API’s configuration responses that would identify the API key used to request the configuration. Those are the prefab-key-id and prefab-env-id.

Second Problem: Getting New Headers Back In Fastly Logs

First, I thought it would be as easy as adding something like “prefab_env_key_id”: %{beresp.http.Prefab-EnvId}V" to the JSON log format to get the new headers shipped our way, but I was confronted with an error message indicating the vcl_log method didn’t have access to that variable. I asked a question in the Fastly Developers community and got a helpful answer.

Varnish All The Way Down

Fastly is a great user experience wrapped around the Varnish HTTP Cache, which shows in its references to VCL (Varnish Control Language) snippets and callbacks like vcl_fetch, vcl_deliver, and vcl_log. Each of those callbacks (subroutines) has access to different objects, so the trick is to use those subroutines to copy data from the original (origin) response (known as beresp) in varnish to the cache object resp so that future invocations of vcl_log have access to that.

In Fastly Configuration, Content → Headers let us do this without writing any VCL.

Fastly UI

Now that the header Prefab-EnvId is set on the cached object, we can refer to that using resp.http.Prefab-EnvId in the JSON logging format like this

"pf_env_id": "%{json.escape(resp.http.Prefab-EnvId)}V"

· 2 min read
James Kebinger
Dogs in the right boxes

After we spent months with all of our Datadog alerts for both staging and production routed to a single slack #alerts channel, we all got tired of looking in that room with urgency to see something happening in staging again. The team asked me to figure out how to create environment-specific alert channels, so here’s what I did.

The Documentation

The slack integration document includes a mention of using message template variables interpolation to dynamically create @-mentions

Best practices for tagging your infrastructure and applications is a long read on the virtues of adding tags (like team ownership) to infrastructure so those tags can be used in alerts.

TL;DR

Once you’ve set up a monitor with a Slack alert, you’ll see something like this in the message: @slack-Prefab_Internal-alerts. That’s a hyphen delimited integration name, integration install name, and then the channel.

To target specific environments by the env tag that’s usually available, edit the monitor’s message like this @slack-Prefab_Internal-alerts-{{env.name}}. The drop-down box below the message won’t be helpful, but this works.

Then setup a slack channel like #alerts-production.

datadog UI for notifications

You may also want to specify the “Multi Alert” aggregation options we chose.

· 7 min read
Jeffrey Chupp

In Hands-On LSP Tutorial: Building a Custom Auto-Complete, we built a language server to provide our text editors with GitHub Issue and PR completion.

Final code from that post
import fetch from "node-fetch";
import {
createConnection,
InitializeResult,
TextDocumentSyncKind,
TextDocuments,
} from "vscode-languageserver/node";
import { TextDocument } from "vscode-languageserver-textdocument";

const connection = createConnection();

const documents: TextDocuments<TextDocument> = new TextDocuments(TextDocument);

type Edge = {
node: {
title: string;
number: number;
url: string;
body: string;
};
};

type Response = {
data: { search: { edges: Edge[] } };
};

// Replace with your real GitHub access token
const accessToken = "YOUR_GITHUB_ACCESS_TOKEN";

const searchScope = "repo:facebook/docusaurus";

const search = async (input: string): Promise<Response> => {
const query = `
{
search(query: "${searchScope} ${input}", type: ISSUE, first: 10) {
edges {
node {
... on PullRequest {
title
number
body
url
}
... on Issue {
title
number
body
url
}
}
}
}
}
`;

const response = await fetch("https://api.github.com/graphql", {
method: "POST",
body: JSON.stringify({ query }),
headers: {
Authorization: `Bearer ${accessToken}`,
},
});

if (!response.ok) {
const error = {
status: response.status,
statusText: response.statusText,
body: await response.json(),
};

connection.console.error(
`Error fetching from GitHub: ${JSON.stringify(error)}`,
);
}

return await response.json();
};

connection.onInitialize(() => {
const result: InitializeResult = {
capabilities: {
textDocumentSync: TextDocumentSyncKind.Incremental,
// Tell the client that the server supports code completion
completionProvider: {
// We want to listen for the "#" character to trigger our completions
triggerCharacters: ["#"],
},
},
};

return result;
});

connection.onCompletion(async (params) => {
const doc = documents.get(params.textDocument.uri);

if (!doc) {
return null;
}

const line = doc.getText().split("\n")[params.position.line];

// return early if we don't see our trigger character
if (!line.includes("#")) {
return null;
}

const input = line.slice(line.lastIndexOf("#") + 1);

const json = await search(input);

const items = json.data.search.edges.map((edge) => {
return {
label: `#${edge.node.number}: ${edge.node.title}`,
detail: `${edge.node.title}\n\n${edge.node.body}`,
textEdit: {
range: {
start: {
line: params.position.line,
character: line.lastIndexOf("#"),
},
end: {
line: params.position.line,
character: params.position.character,
},
},
newText: `${edge.node.url}`,
},
};
});

return {
isIncomplete: true,
items,
};
});

// Make the text document manager listen on the connection
// for open, change and close text document events
documents.listen(connection);

// Listen on the connection
connection.listen();

It works, but searchScope (e.g., repo:facebook/docusaurus) and our GitHub access token are hard-coded. That's fine for local development, but to get this language server ready to share, we should make those configurable.

Approach

  1. We'll update server.ts to use editor-provided config instead of hard-coded values.
  2. We'll update our editor setup to provide config to the server.

Updating our server to use config

First, we remove previous assignments of accessToken and searchScope.

- // Replace with your real GitHub access token
- const accessToken = "YOUR_GITHUB_ACCESS_TOKEN";
-
- const searchScope = "repo:facebook/docusaurus";

Instead, we specify the shape of our configuration, a default number of max results, and default our config to an empty object.

interface Config {
githubAccessToken?: string;
searchScope?: string;
maxResults?: number;
}

const DEFAULT_MAX_RESULTS = 10;

let config: Config = {};

Next, we change our search query to the searchScope and maxResults from our settings.

- const search = async (input: string): Promise<Response> => {
- const query = `
- {
- search(query: "${searchScope} ${input}", type: ISSUE, first: 10) {
const search = async (input: string): Promise<Response> => {
const query = `
{
search(query: "${config.searchScope} ${input}", type: ISSUE, first: ${
config.maxResults ?? DEFAULT_MAX_RESULTS
}) {

We use the token from our config.

- Authorization: `Bearer ${accessToken}`,
Authorization: `Bearer ${config.githubAccessToken}`,

We'll add a getConfig function to get the current config from the editor. We look up our configuration by a prefix we specify (here, myLSP).

const getConfig = async () => {
config = await connection.workspace.getConfiguration("myLSP");
};

Finally, we call getConfig in our onCompletion function and we return early with an error if we're missing any required fields.

connection.onCompletion(async (params) => {
await getConfig();

if (!config.githubAccessToken || !config.searchScope) {
connection.console.error("Both githubAccessToken and searchScope required");
return;
}
// ...

That's all we needed to change for our server.

Updated full server.ts code
import fetch from "node-fetch";
import {
createConnection,
InitializeResult,
TextDocumentSyncKind,
TextDocuments,
} from "vscode-languageserver/node";
import { TextDocument } from "vscode-languageserver-textdocument";

const connection = createConnection();

const documents: TextDocuments<TextDocument> = new TextDocuments(TextDocument);

type Edge = {
node: {
title: string;
number: number;
url: string;
body: string;
};
};

type Response = {
data: { search: { edges: Edge[] } };
};

interface Config {
githubAccessToken?: string;
searchScope?: string;
maxResults?: number;
}

const DEFAULT_MAX_RESULTS = 10;

let config: Config = {};

const search = async (input: string): Promise<Response> => {
const query = `
{
search(query: "${config.searchScope} ${input}", type: ISSUE, first: ${
config.maxResults ?? DEFAULT_MAX_RESULTS
}) {
edges {
node {
... on PullRequest {
title
number
body
url
}
... on Issue {
title
number
body
url
}
}
}
}
}
`;

const response = await fetch("https://api.github.com/graphql", {
method: "POST",
body: JSON.stringify({ query }),
headers: {
Authorization: `Bearer ${config.githubAccessToken}`,
},
});

if (!response.ok) {
const error = {
status: response.status,
statusText: response.statusText,
body: await response.json(),
};

connection.console.error(
`Error fetching from GitHub: ${JSON.stringify(error)}`,
);
}

return await response.json();
};

const getConfig = async () => {
config = await connection.workspace.getConfiguration("myLSP");
};

connection.onInitialize(() => {
const result: InitializeResult = {
capabilities: {
textDocumentSync: TextDocumentSyncKind.Incremental,
// Tell the client that the server supports code completion
completionProvider: {
// We want to listen for the "#" character to trigger our completions
triggerCharacters: ["#"],
},
},
};

return result;
});

connection.onCompletion(async (params) => {
await getConfig();

if (!config.githubAccessToken || !config.searchScope) {
connection.console.error("Both githubAccessToken and searchScope required");
return;
}

const doc = documents.get(params.textDocument.uri);

if (!doc) {
return null;
}

const line = doc.getText().split("\n")[params.position.line];

// return early if we don't see our trigger character
if (!line.includes("#")) {
return null;
}

const input = line.slice(line.lastIndexOf("#") + 1);

const json = await search(input);

const items = json.data.search.edges.map((edge) => {
return {
label: `#${edge.node.number}: ${edge.node.title}`,
detail: `${edge.node.title}\n\n${edge.node.body}`,
textEdit: {
range: {
start: {
line: params.position.line,
character: line.lastIndexOf("#"),
},
end: {
line: params.position.line,
character: params.position.character,
},
},
newText: `${edge.node.url}`,
},
};
});

return {
isIncomplete: true,
items,
};
});

// Make the text document manager listen on the connection
// for open, change and close text document events
documents.listen(connection);

// Listen on the connection
connection.listen();

Providing config from the editor

As in the previous post, I will cover VS Code and Neovim here.

How to configure your server with VS Code

VS Code extensions specify configuration via the root package.json. Replace your contributes block with the following:

"contributes": {
"configuration": {
"title": "My Extension Name",
"properties": {
"myLSP.githubAccessToken": {
"type": "string",
"default": "",
"description": "Your GitHub access token"
},
"myLSP.searchScope": {
"type": "string",
"default": "repo:facebook/docusaurus",
"description": "Search scope for GitHub PRs/Issues"
},
"myLSP.maxResults": {
"type": "number",
"default": 10,
"description": "Max number of completion suggestions"
}
}
}
},

Note that the myLSP prefix on the properties matches the string we use in our getConfig function.

Once you've repackaged and reinstalled your extension, you can modify the settings by pressing ctrl+shift+p (cmd+shift+p on Mac) and typing "Open Settings UI"

Once in the Settings UI, search for myLSP to modify your settings.

vs-code example

Changes take effect immediately because we pull the settings in our server as needed.

How to configure your server with Neovim

For Neovim, we modify our vim.lsp.start invocation to add a settings table to our configuration.

vim.lsp.start {
name = "test-ls",
--- ...
settings = {
myLSP = {
githubAccessToken = os.getenv("GITHUB_ACCESS_TOKEN"),
searchScope = "owner:github",
maxResults = 15,
}
},
}

Since we only provide these settings in our vim.lsp.start, you'll need to restart your editor for changes to take effect.

What's next?

Next, we will look at distributing our language server to make it trivial for our coworkers and friends to install and use in their editors.

If you'd like to be notified when I publish more LSP content, sign up for my newsletter.

· 10 min read
Jeffrey Chupp

What we'll build in this tutorial

Today, we'll build out completion for linking to GitHub issues and PRs.

Developers can use this completion to reference issues and PRs from their code comments and commit messages.

What is a Language Server?

Language servers let you meet developers where they are: their editor. They give you the power of in-editor tooling without rebuilding your tools for each editor you want to support. They can implement features like completion, go-to-definition, code actions, and more.

The Language Server Protocol (LSP) specifies how a language server should interface with an editor.

Approach

  1. We'll build a language server prototype with hard-coded completion choices.
  2. We'll wire this to our editor and test it out.
  3. We'll replace the hard-coded choices with dynamic content from GitHub's API.

Prototyping our Language Server

We'll build this demo in Typescript.

We install a few dependencies to handle the language server implementation details. We'll also install node-fetch to help us make API requests.

npm install vscode-languageserver vscode-languageserver-textdocument vscode-languageclient node-fetch@2
npm install --save-dev @types/vscode @types/node-fetch

(Don't let the vscode bits confuse you. While Microsoft publishes these packages, your language server will be editor-agnostic.)

Now, we can start writing our code. We'll start with distilled boilerplate code borrowed from the Visual Studio Code Language Server Extension Guide.

import {
createConnection,
InitializeResult,
TextDocumentSyncKind,
TextDocuments,
} from "vscode-languageserver/node";
import { TextDocument } from "vscode-languageserver-textdocument";

const connection = createConnection();

const documents: TextDocuments<TextDocument> = new TextDocuments(TextDocument);

connection.onInitialize(() => {
const result: InitializeResult = {
capabilities: {
textDocumentSync: TextDocumentSyncKind.Incremental,
// Tell the client that the server supports code completion
completionProvider: {
// We want to listen for the "#" character to trigger our completions
triggerCharacters: ["#"],
},
},
};

return result;
});

// Make the text document manager listen on the connection
// for open, change and close text document events
documents.listen(connection);

// Listen on the connection
connection.listen();

After our imports, we set up a connection and some documents. The connection handles the communication between the editor (the client) and our code (the server). documents is a store of open files in our editor. documents stays in sync when files open, close, and change.

When the editor starts our server, onInitialize is fired. The editor sends some params describing what it can do (and what it wants our server to do), but we're not using those here, so we'll omit them. We reply with a list of things we can do (primarily, we offer completions triggered by #).

We tell documents to listen on the connection and the connection to listen for communication from the editor.

Our server could now run hosted by the editor (more on that later), but it doesn't do anything useful yet.

Let's add the initial hard-coded completion after our onInitialize.

connection.onCompletion(() => {
return [
{
label: "Cannot set properties of undefined (setting 'minItems')",
data: "https://github.com/facebook/docusaurus/issues/9271",
},
{
label: "docs: fix typo in docs-introduction",
data: "https://github.com/facebook/docusaurus/pull/9267",
},
{
label: "Upgrade notification command does not copy properly",
data: "https://github.com/facebook/docusaurus/issues/9239",
},
];
});

Now, we're ready to test this out.

Wiring this up to our Editor/IDE

While the LSP standardizes how to implement Language Servers and how they talk to editors, there's no good standard for making your editor aware of a language server so that it is ready to be used. For this post, I'll cover VS Code and Neovim.

How to connect your Language Server to VS Code

You connect a language server to VS Code by creating a thin extension layer. To make things as easy as possible for you, I've published a Minimum Viable VS Code Language Server Extension. Follow the directions in that repo and replace the entire contents of server/src/server.ts with the code we're writing in this post.

After following those directions, we'll also need to install node-fetch for the server since the node version packaged with VS Code is too old to include native fetch support.

cd server && npm install node-fetch@2 && npm install --save-dev @types/node-fetch && cd -

Now, you should be able to test your extension with the steps mentioned here. You should be able to open a new file, type #, and then see something like this:

vs-code example

How to connect your Language Server to Neovim

For Neovim, you pass some configuration to vim.lsp.start. For testing, I've put the following in ~/.config/nvim/after/ftplugin/ruby.lua. Note that this location means it only runs in ruby files. Feel free to start the server elsewhere.

local capabilities = vim.lsp.protocol.make_client_capabilities()

vim.lsp.start {
name = "test-ls",
cmd = {
"npx", "ts-node",
"/Users/ship/src/lsp-example-completion/server.ts",
"--stdio"
},
capabilities = capabilities
}

I'm using npx ts-node to run my server.ts without an intermediate build step, but you can use whatever approach you prefer.

You should be able to open a new file, type #, and then see something like this:

neovim example

If you encounter any trouble, check the output of :LspLog

Dynamic Completions

GitHub provides a GraphQL API. Get a GitHub access token and have it handy. It appears a classic token with "repo" permissions is required to get the search query to work correctly (at the time of writing).

Here's a query to search the Docusaurus repo for issues and PRs that include the word "bug.”

{
search(query: "repo:facebook/docusaurus bug", type: ISSUE, first: 10) {
edges {
node {
... on PullRequest {
title
number
body
url
}
... on Issue {
title
number
body
url
}
}
}
}
}

We could replace repo:facebook/docusaurus with owner:facebook to scope search every repo owned by Facebook (that we have access to).

You can replace the searchScope with your organization or repo name when applying the changes below. Be sure to swap out the placeholder YOUR_GITHUB_ACCESS_TOKEN with your token. In a future post, we'll cover language server configuration to pass in our search scope and access token.

import fetch from "node-fetch";

type Edge = {
node: {
title: string;
number: number;
url: string;
body: string;
};
};

type Response = {
data: { search: { edges: Edge[] } };
};

// Replace with your real GitHub access token
const accessToken = "YOUR_GITHUB_ACCESS_TOKEN";

const searchScope = "repo:facebook/docusaurus";

const search = async (input: string): Promise<Response> => {
const query = `
{
search(query: "${searchScope} ${input}", type: ISSUE, first: 10) {
edges {
node {
... on PullRequest {
title
number
body
url
}
... on Issue {
title
number
body
url
}
}
}
}
}
`;

const response = await fetch("https://api.github.com/graphql", {
method: "POST",
body: JSON.stringify({ query }),
headers: {
Authorization: `Bearer ${accessToken}`,
},
});

if (!response.ok) {
const error = {
status: response.status,
statusText: response.statusText,
body: await response.json(),
};

connection.console.error(
`Error fetching from GitHub: ${JSON.stringify(error)}`,
);
}

return await response.json();
};

// ...

connection.onCompletion(async () => {
const input = "bug";

const json = await search(input);

const items = json.data.search.edges.map((edge) => {
return {
label: `#${edge.node.number}: ${edge.node.title}`,
detail: `${edge.node.title}\n\n${edge.node.body}`,
insertText: edge.node.url,
};
});

return {
isIncomplete: true,
items,
};
});

Note the isIncomplete: true here. This tells the editor that the list we're returning isn't exhaustive and might change over time.

Our language server works!

vs-code example 2

But it uses our hard-coded search string "bug" instead of completing based on our input. Let's fix that.

connection.onCompletion takes a CompletionParams argument that looks something like this:

{
"textDocument":{"uri":"file:///private/tmp/example_file.rb"},
"context":{"triggerKind":3},
"position":{"line":2,"character":11}
}

With this information and the documents we initialized earlier, we have everything we need to understand what the user was typing.

connection.onCompletion(async (params) => {
const doc = documents.get(params.textDocument.uri);

if (!doc) {
return null;
}

const line = doc.getText().split("\n")[params.position.line];

// return early if we don't see our trigger character
if (!line.includes("#")) {
return null;
}

const input = line.slice(line.lastIndexOf("#") + 1);

// ...
vs-code example 3
vs-code example 4

After selecting our completion choice, you’ll notice that the # character hangs around. How do we prevent this? And why do we need a trigger character anyway?

If we didn't have a trigger character, this completion would trigger constantly as we type any code. That's wasteful for API requests and means we're doing unnecessary work -- we only want to show issues and PRs in very specific scenarios. So, the behavior should be opt-in, and a trigger character is beneficial.

To prevent the # from hanging around, we can replace our naive insertText with a TextEdit. TextEdits allow you to specify a range in the document to replace with new text. We'll modify our completion item formatting function:

const items = json.data.search.edges.map((edge) => {
return {
label: `#${edge.node.number}: ${edge.node.title}`,
detail: `${edge.node.title}\n\n${edge.node.body}`,
textEdit: {
range: {
start: {
line: params.position.line,
character: line.lastIndexOf("#"),
},
end: {
line: params.position.line,
character: params.position.character,
},
},
newText: `${edge.node.url}`,
},
};
});
// ...

With this change, we overwrite the # so that only the new content remains.

Final code

import fetch from "node-fetch";
import {
createConnection,
InitializeResult,
TextDocumentSyncKind,
TextDocuments,
} from "vscode-languageserver/node";
import { TextDocument } from "vscode-languageserver-textdocument";

const connection = createConnection();

const documents: TextDocuments<TextDocument> = new TextDocuments(TextDocument);

type Edge = {
node: {
title: string;
number: number;
url: string;
body: string;
};
};

type Response = {
data: { search: { edges: Edge[] } };
};

// Replace with your real GitHub access token
const accessToken = "YOUR_GITHUB_ACCESS_TOKEN";

const searchScope = "repo:facebook/docusaurus";

const search = async (input: string): Promise<Response> => {
const query = `
{
search(query: "${searchScope} ${input}", type: ISSUE, first: 10) {
edges {
node {
... on PullRequest {
title
number
body
url
}
... on Issue {
title
number
body
url
}
}
}
}
}
`;

const response = await fetch("https://api.github.com/graphql", {
method: "POST",
body: JSON.stringify({ query }),
headers: {
Authorization: `Bearer ${accessToken}`,
},
});

if (!response.ok) {
const error = {
status: response.status,
statusText: response.statusText,
body: await response.json(),
};

connection.console.error(
`Error fetching from GitHub: ${JSON.stringify(error)}`,
);
}

return await response.json();
};

connection.onInitialize(() => {
const result: InitializeResult = {
capabilities: {
textDocumentSync: TextDocumentSyncKind.Incremental,
// Tell the client that the server supports code completion
completionProvider: {
// We want to listen for the "#" character to trigger our completions
triggerCharacters: ["#"],
},
},
};

return result;
});

connection.onCompletion(async (params) => {
const doc = documents.get(params.textDocument.uri);

if (!doc) {
return null;
}

const line = doc.getText().split("\n")[params.position.line];

// return early if we don't see our trigger character
if (!line.includes("#")) {
return null;
}

const input = line.slice(line.lastIndexOf("#") + 1);

const json = await search(input);

const items = json.data.search.edges.map((edge) => {
return {
label: `#${edge.node.number}: ${edge.node.title}`,
detail: `${edge.node.title}\n\n${edge.node.body}`,
textEdit: {
range: {
start: {
line: params.position.line,
character: line.lastIndexOf("#"),
},
end: {
line: params.position.line,
character: params.position.character,
},
},
newText: `${edge.node.url}`,
},
};
});

return {
isIncomplete: true,
items,
};
});

// Make the text document manager listen on the connection
// for open, change and close text document events
documents.listen(connection);

// Listen on the connection
connection.listen();

We've only scratched the surface of what language servers can do. There's still plenty of the LSP left to explore.

Read the next article Hands-On LSP Tutorial: Configuration.

If you'd like to be notified when I publish more LSP content, sign up for my newsletter.

· 7 min read
Jeff Dwyer

What is Tagged Logging?

Should you be using it? What libraries support it? Is it the same as structured logging?

Let's start with an example. Here's ActiveSupport::TaggedLogging in action.

class Permissions
def get(user)
logger.tagged("Auth API") do
uri = URI("https://auth.example.com/permissions/?user=#{user}")

logger.info { "Fetching #{uri}" }
permissions = Permissions.parse(Net::HTTP.get(uri))
logger.info { "Got permissions #{permissions.to_s}" }
end
end
end

# Log Output
#[Auth API] Fetching https://auth.example.com/permissions/?user=bob
#[Auth API] Got permissions admin, writer, reader

An example from Justin Weiss shows us how we might do this for users:

logger.tagged(current_user_id ? "user-#{current_user_id}" : "user-anonymous") do
logger.debug "Finding people..."
@people = Person.all
logger.debug "Found #{@people.length} people!"
end

# Log Output
#[user-123] Finding people...
#[user-123] Person Load (0.3ms) SELECT "people".* FROM "people"
#[user-123] Found 0 people!

What's Good About Tagged Logging

Good Readability / Grep-ability

It's nice that the log lines for multiple parts of a request have the same tag. That's going to be helpful with ctrl-f locally or grep in production.

It's DRY (Don't Repeat Yourself)

We put the user.id in one place, and it is now in all the log output. We could even put that in an around_action and have it in all logging for our request.

What's Not as Good

Poor Machine Readability

logger.tagged "user-#{user.id}" do
logger.tagged team.name do
logger.info "hello"
end
end

# The plain text version of the log reads nicely
#INFO [user-1] [The Best Team] hello

# But the JSON Log isn't ideal
#{severity: INFO, message: "hello", tags: ["user-123", "The Best Team"]}

Almost every log aggregator will support JSON logging and user-123 won't be as easy to search as {user: 123}. With the structured format you're going to be able to display and search the column user for 123. Additionally, the team name is just floating there without a key. This post Structured logging in Rails lays out the problem well. Does your logging aggregator want:

[2018-05-13T17:51:08.772542] [INFO] [127.0.0.1] [2892] Customer registration error - SSN already exists.

Or

{
"timestamp": "2018-05-13T17:51:08.772542",
"source": "127.0.0.1",
"msg": "SSN already exists.",
"level": "INFO",
"user_id": 2892,
"error_type": "customer_registration"
}

The answer is almost assuredly the latter.

The gem json_tagged_logger gets in the fight here. It takes the same tagging behavior and puts them in a tags key.

Rails.logger.tagged("TAG") do
Rails.logger.tagged("NESTED").info("nested tagged log message")
end

# Log Output
{
"level": "INFO",
"request_id": "914f6104-538d-4ddc-beef-37bfe06ca1c7",
"host": "127.0.0.1",
"my_param": "param value",
"user_id": "99",
"my_custom_header": "some header value",
"tags": [
"TAG",
"NESTED"
],
"msg": "nested tagged log message"
}

If you really like tagging, this is a good move.

🤔 But are these things you're tagging really "tags" or should they be new keypairs? In a practical example, the code is more likely something like this:

Rails.logger.tagged("BillingJob") do
Rails.logger.tagged("User #{user.id}").info("nested tagged log message")
end

and wouldn't that be better JSON as:

{
"class": "BillingJob",
"user": 1234
}

vs

{ "tags": ["BillingJob", "User 1234"] }

If you're nodding along, at this point, you're probably wondering whether maybe we should just write:

logger.info "nested message", user: 1234, class: "BillingJob"

If so, congratulations, you've just invented structured logging!

What's Structured Logging?

As you can see, structured logging is pretty straightforward. It lets us do something like this:

log.info "hello", user: "1", team: "The best team"

# Text Log
# INFO hello user=1, team=The Best Team
# JSON Log
# {severity: INFO, message: "hello", user: 1, team: "The Best Team"}

Hey now, that really is better.

So, what libraries do this? Quite a few, actually. Here's what they look like:

logcraft

logger = Logcraft.logger 'Application'

logger.info 'Log message'
# => {"timestamp":"2022-...","level":"INFO","logger":"Application","pid":80422,"message":"Log message"}

logger.info message: 'User logged in', user_id: 42
# => {"timestamp":"2022-...","level":"INFO","logger":"Application","pid":80422,"message":"User logged in","user_id":42}

I don't really dig having to say message: but it's not the end of the world. Logcraft has around 5k downloads on rubygems as of today.

ougai

Here's an excerpt from the ougai docs:

logger.info('Information!')
#{"name":"main","hostname":"mint","pid":14607,"level":30,"time":"2016-10-16T22:26:48.835+09:00","v":0,"msg":"Information!"}

logger.info({
msg: 'Request', method: 'GET', path: '/login',
format: 'html', controller: 'LoginController',
action: 'new', status: 200
})
#{"name":"main","hostname":"mint","pid":9044,"level":30,"time":"2016-10-28T17:58:53.668+09:00","v":0,"msg":"Request","method":"GET","path":"/login","format":"html","controller":"LoginController","action":"new","status":200}

Again with the msg: but it's not the end of the world. Ougai has around 8.4M downloads on rubygems as of today.

semantic_logger

logger.debug('log this', foo: 'foo', bar: 'bar')
# Text
#DEBUG 2023-09-06 10:31:18 -0400: {foo: foo,bar: bar} log this
# JSON
{"named_tags":{"foo":"foo","bar":"bar"},"name":"Post","message":"log this"}

Semantic Logger has about 12M downloads, and there is also a rails_semantic_logger gem with Rails-specific integrations.

What About Lograge?

Lograge is sort of a structured logger. It does output structured logs for your Rails requests, but it doesn't give you a way to structure your own logs. It's 101M downloads and a proven tool, but I'm not including it here because we're talking about structured logging and tagged logging in general.

So, is Tagged Logger worse than Structured Logging?

Well, yes. Mostly. Honestly I didn't have an opinion before I really dug in and researched this, but in my view structured logging can give us the same benefits as tagged logging, but in a cleaner, more machine-readable way that fits better with modern log aggregators.

🤔 However there is one thing I found valuable with tagged logging and that is nested contexts. It’s a convenient way to propagate context instead of adding it to each log entry. Does using structured logging mean we have to give that up?

No! Fortunately, we can do this in SemanticLogger as well. It looks like this:

logger.tagged {user: 1} do
logger.info "nested"
end

# Log Output
# nested user=1

Could it be better?

My last gripe would be that I'm repeating this context. Isn't this the same context I'm using for my feature flags library, datadog, and sentry?

Sentry.configure_scope do |scope|
scope.set_context(
'user',
{
name: 'Bob Beemer',
id: 1234,
}
)
end

Why do I need to repeat this again for logging? Can't I just do it once?

The Ideal

I ended up writing this post because we're working on a dynamic log library, and I wanted to make sure we had all our bases covered with respect to structured logging and tagged logging.

Semantic logger is pretty darn good, but if we're going to really DRY things up I could reuse the context I use for feature flags. Here's what that might look like.

My proposed ideal code

# application_controller.rb
class ApplicationController < ActionController::Base

around_action do |_, block|
# Put our context together in one place for the whole request
$prefab.with_context({
user: {
key: current_user.tracking_id,
id: current_user.id,
name: current_user.name,
}
}, &block)
end
end

# application.rb
module MyApplication
class Application < Rails::Application
# tell the logger to add the values of these contexts to log messages
logger.add_context_keys "user.id", "user.name"

# or add extra logging context in a block
logger.with_context {runtime: {value: 42}} do
logger.info "hello"

# Feature flag value evaluation uses existing context for user.id and runtime.value
$prefab.get("my-feature-flag")
end
end
end

# Log Output
# INFO hello user.id=1234, user.name=Bob Beemer, runtime_value=42

This is the approach we're planning on, but I'd love to hear your thoughts.

Summary

  1. Tagged logging is nice.
    1. Tagged logging makes decent-looking console output.
    2. Tagged logging requires adding things manually.
    3. Tagged logging is a bit like structured logging, but in a way that log aggregators can't use.
  2. Structured logging is better.
  3. But ideally structured logging can also add structure in a block.
  4. Semantic Logger is a great library that does this, and it's the same approach we're taking at Prefab.

· 7 min read
James Kebinger

Changing the log level in Java

Programmatically changing the log level in Java is easy. The only up-front hitch is that you just can't do it via SLF4J -- you'll need to use the underlying logging framework.

Log4j has a Configurator class we can use and in Logback we simply get the logger and set the level. Here's what that looks like.

// Log4J
Configurator.setLevel(loggerName, Level.INFO);

// Logback
Logger logger = (Logger)LoggerFactory.getLogger(loggerName);
logger.setLevel(Level.INFO);

But that's not really why you're here, is it?

What you're actually after is changing the log level without restarting the application. What you're actually after is getting more debug information from a production system without having to redeploy or turn off excessive logging that is costing you money in log ingestion costs.

Changing the log level programmatically is only half the battle (or maybe a 1/10th the battle). In any real application, we now we need to figure out how this code gets called.

This challenge is more complex, but it breaks down into two big parts:

  1. Where does the user input the logger and level?
  2. How does this system broadcast this information to all services?

Capturing the user's desire to change the log level

The developer wants more debug information about FooClass. How do we capture this intent? Collecting the changes comes in a few flavors.

An API request to change the levels

Your developer can POST to an internal URL. Something like POST /log-levels?logger=com.example.FooClass&level=DEBUG would work. The question here will be how to broadcast this change to all instances of the service.

A JMX Request to change the levels

Log4j supports JMX to trigger configuration reloads or edit the config file itself. There's even a client GUI! As with using an API request to a custom endpoint, that'll only work one application instance at a time.

A Shared File

Your developer can edit a shared log4j.properties or logback.xml somewhere. This is the most common way to do this, but sharing the file around can be an annoying infrastructure challenge and the behavior if you corrupt the file is not always obvious. Log4j also supports (re)loading configuration from a URI which is still basically just a remote shared file or files.

A Centralized UI

Maybe you build an internal UI or you use something like SpringBoot Admin or Prefab. The Spring Boot Admin UI looks like this:

centralized log level change UI in spring boot

The Prefab UI is similar, giving us a hierarchical view as well as some feedback on log volume at each level.

centralized log level change UI in Prefab

A centralized UI may or may not have knowledge about the loggers that you're using. Prefab has knowledge of both the logger name as well as the frequency they are being used so it is able to render this helpful UI. This is helpful so you don't have to correctly type the logger name, but also so you know how many message/sec are happening at each logger. To do this, you'll need to send aggregate telemetry back to a central location on logging calls.

Broadcasting the change to all instances of the service

Once we've captured the intent to change the log level, we need to broadcast this change to all instances of the service. While some of the approaches like shared files or URI-based configuration will handle this easily, others like the API endpoint approach have shortcomings in this department. Hit that custom logging configuration API, and our load balancer delivered the POST to one instance of our service, and we changed the log level there. But... what about the other instances? Hope that was the one you wanted!

Broadcasting the changes is typically:

Ignoring this problem

We could skip this step and ony changing the log levels on the one instance that received the user intent. Depending on your use case, this may be sufficient. If you only have a single instance, or you are just looking for a sampling of logs it should serve you well. If you have multiple instances or you are looking for a specific log message, this will not work and it may lead to confusion.

Service Discovery

We could implement some type of service discovery to propagate the change to peer application instances. If each instance is aware of the other instances, then we can broadcast the change to all of them. It's unlikely you're going to want to build this out yourself just for this logging use case, but if this is technology you already have in place, it could be a reasonable option.

NFS File Sync / URI Sync

Having our log4j.properties synced over something like NFS and setting clients to monitor/scan the file/URI for changes. Log4j uses <Configuration monitorInterval="30"> to refresh. In Logback this looks like <configuration scan="true" scanPeriod="30 seconds">.

This may be a reasonable option if you already have a shared file system in place, but wouldn't be the first choice for most people.

PubSub

We can listening to a PubSub system for changes and programmatically configuring the logging system using the code above. Your API endpoint could publish a Redis pub/sub and all instances would listen for changes. Sandeep has a full fledged example of this (in Python) in route triggered state update across workers.

This is likely the easiest solution if you are going to DIY. Redis Pub/Sub is pretty easy to use. You'll need to decide on the message format, but it's a pretty straightforward solution.

One wrinkle is that the log level change will reset as instances reboot. Pub/Sub is fleeting. Instances that are listening at the time of the change will set their levels, but new instances that start afterwards will not see the change. If you want new instances to pick up the change, you'll need use something like Kafka or Redis Streams which will persist the message and allow new instances to pick up the change.

Filter & Dynamic Configuration

The last option is a bit different. Rather than listen for changes and set the log level, we can use a log filter to decide whether or not to log a message and have this filter look up the current log level by looking into a dynamic configuration system.

This dynamic configuration system will use some form of pub sub to ensure its working from the most current configuration, so this could also be seen as a variation of #4, but it's distinct in that this pub/sub system is not a 1-off just for logging. This is what Prefab does, specifically, we use SSE to get the changes and then a Logback TurboFilter that looks up the current value. Pseudo code for this looks like:

 public FilterReply decide(
Marker marker,
Logger logger,
Level level,
String s,
Object[] objects,
Throwable throwable
) {
Optional<Prefab.LogLevel> loglevelMaybe = dynamicConfig.getLogLevel(logger);
if (loglevelMaybe.isPresent()) {
Level calculatedMinLogLevelToAccept = LogbackLevelMapper.LEVEL_MAP.get(
loglevelMaybe.get()
);
if (level.isGreaterOrEqual(calculatedMinLogLevelToAccept)) {
return FilterReply.ACCEPT;
}
return FilterReply.DENY;
}
return FilterReply.NEUTRAL;
}

Implementation as a filter has the advantages of

  1. Gathering information about log requests, whether or not they make it to a log file
  2. Allows custom log levels based on information beyond classname, like application state or local context.
  3. Filters have more context so you can log at DEBUG only for a certain user, instead of just a certain class.

To install this filter check out the set up dynamic logging with logback.

Summary

Dynamic logging is a great way to get more information out of your production system without having to redeploy.

At first, the problem looks like a simple question of figuring out how to change the log level programatically, but as we get into the real world we see that the real challenge is:

  1. Building an ergonomic solution for our developers to input their desired log levels
  2. Propagating this change across our service instances

Building this ourselves is very doable. Adding a POST endpoint to our services and using a Redis Pub/Sub to broadcast the change is a perfectly viable solution. Alternatively a service like Prefab will give us the capability out of the box, with some nice bells and whistles like a UI, log volume information and advanced targeting.

· 5 min read
James Kebinger

A health check is a page or endpoint that can indicate whether an application can properly serve requests. Health checks are a useful diagnostic mechanism and it’s handy to have them for all of your application’s dependencies. On the other hand, a failing health check can prevent your application instances from serving requests unaffected by the failing dependency, to the point of complete downtime. How to keep your application’s instances or pods healthy in the face of failing health checks?

There’s a fair amount written on the topic of health check management already (some links below), including approaches such as these:

  • Break down services to have simpler functionality to dependency graphs
  • Categorize your critical dependencies differently from your nice-to-have dependencies
  • Separate liveness checks from readiness checks
  • Think about failing-open vs closed when all health checks have gone bad

The Problem

These are the right approaches, but it's tough to get them right on the first shot. In practice this require a fair amount of tuning work to dial in the right behavior. The upshot is that it's fairly common to end up in a situation where a health check starts failing and your application instances or pods are withdrawn from service, but you'd prefer to keep them up and running.

This Redis healthcheck is failing and taking our pods out of service, but the apps are still mostly functional. How do we fix this immediately?

Malleability To The Rescue

To deal with a problem right now, we took an alternative approach: dynamically manage an ignore-list of health checks that will be excluded from aggregation into an overall health status.

We recently encountered an issue with one of our Redis hosts which caused downtime in our staging environment as pods considered unhealthy were withdrawn from service (despite not affecting most core functionality). Once we realized what’s going on, I quickly subclassed Micronaut's DefaultHealthAggregator and created a dynamically configured ignore list to exclude certain health checks. I also added logging to gain better visibility into failing health checks. These changes will give us more visibility and flexibility to quickly handle similar dependency failures in the future.

Micronaut Health Checks Example

@Singleton
@Requires(beans = HealthEndpoint.class)
@Replaces(bean = DefaultHealthAggregator.class)
public class PrefabHealthAggregator extends DefaultHealthAggregator {

private static final Logger LOG = LoggerFactory.getLogger(PrefabHealthAggregator.class);
private static final String CONFIG_KEY = "micronaut.ignored.health.checks";
private final Value<List<String>> ignoredCheckNames;

public PrefabHealthAggregator(
ConfigClient configClient,
ApplicationConfiguration applicationConfiguration
) {
super(applicationConfiguration);
this.ignoredCheckNames = configClient.liveStringList(CONFIG_KEY);
}

@Override
protected HealthStatus calculateOverallStatus(List<HealthResult> results) {
List<String> ignoredResultNames = ignoredCheckNames.orElseGet(Collections::emptyList);

return results
.stream()
.filter(healthResult -> {
if (ignoredResultNames.contains(healthResult.getName())) {
LOG.warn(
"Ignoring health check {} with status {} and details {} based on prefab config {}",
healthResult.getName(),
healthResult.getStatus(),
healthResult.getDetails(),
CONFIG_KEY
);
return false;
}
if (healthResult.getStatus() != HealthStatus.UP) {
LOG.warn(
"Unhealthy status for healthcheck {} with status {} and details {}. To ignore add name to prefab config {}",
healthResult.getName(),
healthResult.getStatus(),
healthResult.getDetails(),
CONFIG_KEY
);
} else {
LOG.debug(
"passing health result named: {} with status {} and details {}",
healthResult.getName(),
healthResult.getStatus(),
healthResult.getDetails()
);
}
return true;
})
.map(HealthResult::getStatus)
.sorted()
.distinct()
.reduce((a, b) -> b)
.orElse(HealthStatus.UNKNOWN);
}
}

Ignoring a Health Check & Verifying It Works

Now in dynamic configuration we can add a list of health checks to ignore. We'll create a List<String> with the name micronaut.ignored.health.checks and then add the name of the health check we want to ignore. For testing, let's add redis to the list in staging.

healthchecks that are ignored

Dynamic configuration will propagate this value out to all connected SDKs. But how do we know that it's all working? Well, luckily for us we have some useful debug logging. To verify that it's working, we can turn on debug logging for the PrefabHealthAggregator class in staging using dynamic logging.

healthchecks for logging

Then we can look over in our logging aggregator and see that the health checks are being ignored:


WARN 10:52:10 Ignoring healthcheck redis with status UP and details {} based on prefab config micronaut.ignored.health.checks

DEBUG 10:52:10 passing healthcheck grpc-server with status UP and details {host=localhost, port=60001}

DEBUG 10:52:10 passing healthcheck diskSpace with status UP and details {total=101203873792, free=49808666624, threshold=10485760}

DEBUG 10:52:10 passing healthcheck service with status UP and details null

DEBUG 10:52:10 passing healthcheck compositeDiscoveryClient() with status UP and details {services={}}

DEBUG 10:52:10 passing healthcheck jdbc:postgresql://1.1.1.1:5432 with status UP and details {database=PostgreSQL, version=15.2}

Summary

Healthchecks are great, but they take some tuning to get right. Unfortunately, getting these wrong can cuase unnecessary downtime.

Luckily, we can use dynamic configuration to quickly tune our healthchecks to get the right behavior as quickly as possible.

· 8 min read
Jeffrey Chupp
Lost in an avalanche of needles

Front-end logging can feel like finding a needle in an avalanche of haystacks.

Since the browser is the bridge between user interaction and the back-end, front-end logs have the potential to be a treasure trove of insights.

Datadog and others offer browser log collections but everyone I’ve talked to that has tried this has turned it off. Why? Way too expensive.

How can we get meaningful signal from our front-end logging? Three steps:

  1. Transports: We need to put some logic between the log statement and the aggregator. Learn how to direct your logs to the right places and why it matters.
  2. Context: We need to give that logic context, so it can decide what it shouldLog. Crafting actionable error messages and adding the right context will make your logs meaningful.
  3. Targeting: We need some rule evaluation for Laser targeting. We’ll go over how to selectively capture logs for troubleshooting without overwhelming your system.

Transports: Sending the right logs to the right places

Step one in front-end logging is figuring out how to get access to the content. It doesn't matter how good your logging is if it only exists in your end-user's browser.

You'll likely want to sign up with a SaaS for log aggregation rather than rolling your own. We'll use Datadog as our example.

After following the installation steps for @datadog/browser-logs, you're able to use datadogLogs.logger.info (or error, or debug, etc.) wherever you'd normally use console.log.

🤔 We probably don't want to replace console.log entirely, right? For local development, it'd sure be nice to see the logs in my browser rather than going to the Datadog UI.

This is where we start thinking in "transports." There's no reason all our logging should go to the same places. Instead, we can have a logger with intelligent routing. This opens up possibilities like

  • Everything in local dev stays local in the console. Nothing is suppressed.
  • Production shows only ERROR messages in the console. Less severe messages don't show up.
  • Production sends anything INFO or more severe to Datadog.

Here's a rough sketch of what this might look like:

const ERROR = 0;
const INFO = 1;
const DEBUG = 2;

const severityLookup = {
[ERROR]: "error",
[INFO]: "info",
[DEBUG]: "debug",
};

const datadogTransport = (severity, message) =>
datadog.logger[severity](message);

const consoleTransport = (severity, message) => console[severity](message);

export const logger = (environment) => {
const transports = [];

if (environment === "production") {
transports.push([datadogTransport, { minSeverity: INFO }]);
transports.push([consoleTransport, { minSeverity: ERROR }]);
} else {
transports.push([consoleTransport]);
}

const report = (severity, message) => {
transports.forEach(([transport, options]) => {
const { minSeverity } = options || {};
if (minSeverity === undefined || severity <= minSeverity) {
transport(severityLookup[severity], message);
}
});
};

return {
debug: (message) => {
report(DEBUG, message);
},
info: (message) => {
report(INFO, message);
},
error: (message) => {
report(ERROR, message);
},
};
};

To easily compare levels, we treat them as numbers. How many log levels you want and what they should be is a subject of some debate

Context is King

There's an art to crafting a good error message.

Without further details, "Something went wrong during checkout" is inactionable and likely to frustrate the developer on the receiving end.

"[checkout]: Charge has an invalid parameter" is a better message but still not yet actionable. Who did this happen to? What were they trying to buy?

Our logs are most actionable when we couple a well-crafted error message with metadata about the state of the user and app. Datadog and other offerings support this metadata out-of-the-box.

Consider the following

logger.error("[checkout]: charge has an invalid parameter", {
customerId: "cus_9s6XKzkNRiz8i3",
cart: { plan: "Pro", credits: desiredCredits },
});

This logs something like

ERROR: [checkout]: Charge has an invalid parameter | customerId=cus_9s6XKzkNRiz8i3 cart.plan="Pro" cart.credits=-10

They tried to buy the Pro plan and -10 credits? -10 looks weird! That's a thread to pull on.

Adding context to our example logger implementation is easy enough. We can just add an extra context variable (already supported by @datadog/browser-logs) throughout.

Updated code

const ERROR = 0;
const INFO = 1;
const DEBUG = 2;

const severityLookup = {
[ERROR]: "error",
[INFO]: "info",
[DEBUG]: "debug",
};

const datadogTransport = (severity, message, context) =>
datadog.logger[severity](message, context);

const consoleTransport = (severity, message, context) =>
console[severity](message, context);

export const logger = (environment) => {
const transports = [];

if (environment === "production") {
transports.push([datadogTransport, { minSeverity: INFO }]);
transports.push([consoleTransport, { minSeverity: ERROR }]);
} else {
transports.push([consoleTransport]);
}

const report = (severity, message, context) => {
transports.forEach(([transport, options]) => {
const { minSeverity } = options || {};
if (minSeverity === undefined || severity <= minSeverity) {
transport(severityLookup[severity], message, context);
}
});
};

return {
debug: (message, context) => {
report(DEBUG, message, context);
},
info: (message, context) => {
report(INFO, message, context);
},
error: (message, context) => {
report(ERROR, message, context);
},
};
};

We ship our logger and everything is working great. Developers are adding more logging and it is easier than ever to track down issues in local dev. But since we're being strategic about what gets shipped to Datadog, we're not paying for those console.debug lines.

Laser Targeting

We start getting reports that the activity feed is broken for the user with trackingId=abc-123. How can we use our approach to access the DEBUG-level logs for this user without recording those same DEBUG-level logs for every user?

You could tweak the report innards to consider the context:

const TRACKING_IDS_TO_ALWAYS_LOG = ["abc-123"];

const report = (severity, message, context) => {
transports.forEach(([transport, options]) => {
const { minSeverity } = options || {};
if (
minSeverity === undefined ||
severity <= minSeverity ||
TRACKING_IDS_TO_ALWAYS_LOG.includes(context.trackingId)
) {
transport(severityLookup[severity], message, context);
}
});
};

This gives us a nice mechanism for allow-listing certain tracking ids for verbose logging.

logger.debug("test", { trackingId: user.trackingId });

// Examples
logger.debug("test", { trackingId: "abc-123" }); // This is sent to Datadog (trackingId matches)
logger.debug("test", { trackingId: "xyz-999" }); // This is not sent

A hard-coded array of tracking ids means we'll need to PR, merge, deploy, etc. to get this change out, but this is still a powerful approach. We'll get all the details we need to fix the user's problem. Once fixed, we can remove their tracking id from the array and do the PR/merge/deploy dance once more.

Dynamic Targeting with Prefab

That all works great. You can call it a day and move on, confident and happy that enabled yourself and other devs to log better than ever.

But let me offer you a superpower: Sign up for Prefab and install @prefab-cloud/prefab-cloud-js (or the react flavor). Now you can target based on any context data to just the right logs to just the right places -- without having to change any code after the initial setup.

Here's the changes needed to start using prefab.shouldLog

import { prefab, Context } from "@prefab-cloud/prefab-cloud-js";

// Set up prefab with the context of the current user
const options = {
apiKey: "YOUR-API-KEY-GOES-HERE",
context: new Context({
user: { trackingId: "abc-123", email: "test@example.com" },
device: { mobile: true },
}),
};
prefab.init(options);

// ...

export const logger = (environment, loggerName) => {
// ...
const report = (severity, message, context) => {
transports.forEach(([transport, options]) => {
const { defaultLevel } = options || {};

// Use prefab to check if we should log based on the user context
if (
prefab.shouldLog({
loggerName,
desiredLevel: severityLookup[severity],
defaultLevel: severityLookup[defaultLevel] ?? "debug",
})
) {
transport(severityLookup[severity], message, context);
}
});
};
// ...

Now we can use the tools in the Prefab UI to target trackingId abc-123 and add and remove users without shipping code changes. Because you can set up custom rules to target whatever you provide in your context, we can even do things like target specific users on specific devices.

Full updated code

import { prefab, Context } from "@prefab-cloud/prefab-cloud-js";

// Set up prefab with the context of the current user
const options = {
apiKey: "YOUR-API-KEY-GOES-HERE",
context: new Context({
user: { trackingId: "abc-123", email: "test@example.com" },
device: { mobile: true },
}),
};
prefab.init(options);

const ERROR = 0;
const INFO = 1;
const DEBUG = 2;

const severityLookup = {
[ERROR]: "error",
[INFO]: "info",
[DEBUG]: "debug",
};

const datadogTransport = (severity, message, context) =>
datadog.logger[severity](message, context);

const consoleTransport = (severity, message, context) =>
console[severity](message, context);

export const logger = (environment, loggerName) => {
const transports = [];

if (environment === "production") {
transports.push([datadogTransport, { defaultLevel: INFO }]);
transports.push([consoleTransport, { defaultLevel: ERROR }]);
} else {
transports.push([consoleTransport]);
}

const report = (severity, message, context) => {
transports.forEach(([transport, options]) => {
const { defaultLevel } = options || {};

// Use prefab to check if we should log based on the user context
if (
prefab.shouldLog({
loggerName,
desiredLevel: severityLookup[severity],
defaultLevel: severityLookup[defaultLevel] ?? "debug",
})
) {
transport(severityLookup[severity], message, context);
}
});
};

return {
debug: (message, context) => {
report(DEBUG, message, context);
},
info: (message, context) => {
report(INFO, message, context);
},
error: (message, context) => {
report(ERROR, message, context);
},
};
};

Wait, you forget to tell me which logging library to install!

You might not need one. I'd hold off picking until I knew more about why I wanted one.

IMHO, there isn't a clear winner. Since the major players (e.g. pino and winston) support you providing your own transports, you can pick whichever you prefer and always change your mind later.

· 3 min read
James Kebinger

Google Spanner is a remarkable piece of technology. It’s a globally distributed, strongly consistent, relational database that scales horizontally with a 99.999% SLA. With the recent addition of granular instance sizing, it's now possible to start with spanner at $40/month andnder the hood it even uses atomic clocks! How cool is that?

If you decide to try Spanner, right off the bat you've got to choose between two dialects, and it's not clear which one is the best choice.

choose your fighter

Spanner comes in two flavors, GoogleSQL(an ANSI-compliant SQL) and PostgreSQL. Unfortunately it's a permanent choice on a per-database basis and it will have significant impact on the code you write, so you need to choose wisely!

Google provides a guide that can help you make an informed decision regarding the choice between PostgreSQL and GoogleSQL, along with a description of the features and limitations of Postgres that are and are not supported. These documents are a good starting point for your decision-making process.

we chose poorly

As a long-time fan of Postgres, I was initially happy that we had chosen the PostgreSQL dialect. However, after becoming more familiar with its limitations compared to GoogleSQL, the PostgresSQL dialect feels like a second-class citizen in Google Spanner. Therefore, we have recently started creating new tables in a GoogleSQL dialect database instead. Here are the details of our experience.

Poor Selection of Functions

Spanner implements many Postgres functions, but when I started to do some in-database rollups of data within time windows and so quickly reached for good old date_trunc and found it unimplemented along with date_bin. The list of supported postgres datetime methods is short and undistinguished. GoogleSQL has much better date time functions

If you think you might query by date or timestamp in your database, you should probably use GoogleSQL.

Missing PostgreSQL Emulator

Google has a spanner emulator runnable from a docker image for the GoogleSQL dialect which will enable cheap, conflict free testing in local and CI/CD environments. PostgresSQL dialect users must pay to run tests against real Spanner database instances with all the headaches that come with that approach. There’s an issue but its been almost a year since the last update.

Named Placeholder Support

The PostgreSQL spanner driver makes you write ugly code like this to build statements

Statement.Builder builder = Statement
.newBuilder("SELECT * FROM cool_table WHERE project_id = $1 AND id >= $2")
.bind("p1")
.to(projectId)
.bind("p2")
.to(startAt);

GoogleSQL spanner driver lets you use this much more pleasant named placeholder approach

Statement.Builder builder = Statement
.newBuilder("SELECT * FROM cool_table WHERE project_id = @projectId AND id >= @id")
.bind("projectId")
.to(projectId)
.bind("id")
.to(startAt);

This distinction is documented here

Your Mileage May Vary

The PostgresSQL dialect might be a good fit for an application already using postgres that also happens not to use any of the missing postgres features. For us, the compatibility isn’t quite there so we’ll be steadily moving towards GoogleSQL