AWS Open Source Blog
Go support for AWS X-Ray now available in AWS Distro for OpenTelemetry
In this blog post, AWS interns Wilbert Guo and Kelvin Lo share their experience in enhancing the OpenTelemetry Go SDK to support sending traces to AWS X-Ray. These enhancements are also available in the AWS Distro for OpenTelemetry.
AWS X-Ray is a service that collects data and provides tools that allow us to view, filter, and gain insights into that data in order to identify issues and areas that could be optimized. The data that it collects includes trace
data that traces user requests as they travel through the entire application. It then aggregates this trace
data so that we can see more detailed information not only about the request and response, but also about calls that our application makes to downstream. These downstream calls can include AWS resources, microservices, databases, and HTTP web APIs.
Supporting AWS X-Ray with OpenTelemetry Go SDK
If you are working in the observability space, you may already know all about the OpenTelemetry project. OpenTelemetry is a fast-developing open source project under the Cloud Native Computing Foundation (CNCF) that provides a set of APIs and SDKs for robust and portable telemetry of cloud-native software. The project enables customers to send telemetry data to multiple different backends for further processing while only having to instrument their application once. The different types of telemetry data that can be sent include traces, metrics, and logs. In this blog post, we will talk about sending traces to the AWS X-Ray backend.
The Go SDK for OpenTelemetry allows engineers to gather data from Go-specific applications. Once the data is collected, it is then sent to backends, such as Prometheus, Jaeger, and AWS X-Ray, to provide further insight into application failures and application performance metrics. What makes AWS X-Ray useful is that it allows engineers to observe their Go applications that are running on AWS-specific services, such as Amazon Elastic Container Service (Amazon ECS) and Amazon Elastic Kubernetes Service (Amazon EKS).
To support sending traces from Go applications instrumented with OpenTelemetry to AWS X-Ray, we added four components to the OpenTelemetry Go SDK: the trace ID generator, propagator, ECS resource detector, and the EKS resource detector.
The preceding diagram shows microservices and applications that are instrumented with OpenTelemetry sending traces to X-Ray. The trace data is then aggregated and filtered in order to be analyzed by application developers.
Building the ID generator
The first component we built was the trace ID generator. To build an X-Ray trace ID generator, it is helpful to first understand what a trace ID is. A trace ID is used to uniquely identify a trace in distributed tracing. This is useful because it can be used to group together all spans
for a specific trace
across all processes. The trace ID is a 32-hex-character lowercase string that is generated when a root span is created. For spans
that have a parent, the trace ID is the same as the parent span’s trace ID because they both belong to the same trace
.
In OpenTelemetry, the creation of OTLP trace ID uses the W3C trace format, which generates a random unique 32-hex-character lowercase string. However, to use OpenTelemetry tracing with X-Ray, we needed to override the OTLP trace ID creation function. This is because X-Ray does not use the W3C trace format; rather, it uses a different format in which the first 8-hex-digits represents the timestamp at which the trace is generated and the remaining 24-hex-digits are randomly generated.
Example of an X-Ray trace ID:
In the diagram below, we show how the X-Ray ID Generator is used with client applications that are instrumented with OpenTelemetry. For client applications that use the OpenTelemetry Go SDK, a TraceProvider
object is created in order to start collecting and sending trace data to other backends (e.g., Prometheus, Jaeger, etc.). Inside the TraceProvider
object, we can specify which implementation of the IDGenerator to use.
We implemented the X-Ray ID Generator by making a single struct with two functions. The first function is NewSpanID()
, which was responsible for generating a new spanID
. For this function, we used the same implementation that OpenTelemetry uses. The second function that we had inside the ID Generator struct was the NewTraceID()
. For this function, we used our own implementation for generating a new traceID
. This is because X-Ray accepts a different traceID
format than the standard W3C trace format.
In the following diagram, we summarize the steps that the NewTraceId()
function takes to generate an X-Ray traceId
. To get the first 8 digits, we convert the current human date timestamp to an epoch timestamp, which is then converted to hexadecimal. The remaining 24 digits belong to a randomly generated hexadecimal number. Together, we concatenate the first 8 digits and the remaining 24 digits to make up the 32-digit hexadecimal X-Ray trace ID.
The source code for the ID generator component follows:
package xrayidgenerator
import (
crand "crypto/rand"
"encoding/binary"
"encoding/hex"
"math/rand"
"strconv"
"sync"
"time"
"go.opentelemetry.io/otel/trace"
)
type IDGenerator struct {
sync.Mutex
randSource *rand.Rand
}
// NewIDGenerator returns an idGenerator used for sending traces to AWS X-Ray
func NewIDGenerator() *IDGenerator {
gen := &IDGenerator{}
var rngSeed int64
_ = binary.Read(crand.Reader, binary.LittleEndian, &rngSeed)
gen.randSource = rand.New(rand.NewSource(rngSeed))
return gen
}
// NewSpanID returns a non-zero span ID from a randomly-chosen sequence.
func (gen *IDGenerator) NewSpanID() trace.SpanID {
gen.Lock()
defer gen.Unlock()
sid := trace.SpanID{}
gen.randSource.Read(sid[:])
return sid
}
// NewTraceID returns a non-zero trace ID based on AWS X-Ray TraceID format.
// (https://docs.thinkwithwp.com/xray/latest/devguide/xray-api-sendingdata.html#xray-api-traceids)
func (gen *IDGenerator) NewTraceID() trace.TraceID {
gen.Lock()
defer gen.Unlock()
tid := trace.TraceID{}
currentTime := getCurrentTimeHex()
copy(tid[:4], currentTime)
gen.randSource.Read(tid[4:])
return tid
}
func getCurrentTimeHex() []uint8 {
currentTime := time.Now().Unix()
// Ignore error since no expected error should result from this operation
// Odd-length strings and non-hex digits are the only 2 error conditions for hex.DecodeString()
// strconv.FromatInt() do not produce odd-length strings or non-hex digits
currentTimeHex, _ := hex.DecodeString(strconv.FormatInt(currentTime, 16))
return currentTimeHex
}
Propagator
The second component we built was the propagator, which is used in context propagation. What is context propagation? Simply put, propagation is the ability to correlate events across different services, and also is one of the core principles behind distributed tracing. In a distributed system, components need to be able to collect, store, and transfer metadata (also referred to as a context). For that reason, propagator objects are configured inside tracer objects in order to support transferring of context across difference processes. A context often will have information identifying the current span and trace, and can contain arbitrary correlations as key-value pairs. Propagation is when context is bundled and transferred across services, often via HTTP headers. This is done by first injecting context into a request and then this is extracted by a receiving service, which can then make additional requests, and inject context to be sent to other services, and so on. Together, context and propagation represent the engine behind distributed tracing.
As a result, the purpose of the propagator we built was to be able to support sending traces specifically from OpenTelemetry to X-Ray. We accomplished this by creating an X-Ray-specific propagator that translates the OpenTelemetry SpanContext
into the equivalent X-Ray header format.
The X-Ray header format consists of three parts:
- the root trace ID
- the parent ID
- the sampling decision
The following is an example of X-Ray trace header with root trace ID, parent segment ID, and sampling decision:
X-Amzn-Trace-Id:
Root=1-5759e988-bd862e3fe1be46a994272793;
Parent=53995c3f42cd8ad8;
Sampled=1
Note:
- The root is required, whereas the parent ID and sampling decision are optional.
- Sampling decision refers to the algorithm used by X-Ray to ensure efficient tracing and provides a representative sample of the requests that your application served to determine which requests gets traced. By default, X-Ray records the first request each second, and five percent of any additional requests.
- For more information, see Configuring sampling rules in AWS X-Ray.
When building the X-Ray propagator, we implemented three functions:
inject()
extract()
fields()
We implemented these functions inside a struct shown in the following diagram.
The first function, inject()
, injects the X-Ray values into the header. Our implementation accepted two parameters, which are the context format for propagating spans and the textMapCarrier interface to allow our propagator to be implemented.
Next, the extract()
function we implemented was required to extract the values from an incoming request. An example would be extracting the values from the headers of an HTTP request. Given a context and a carrier, the extract()
function extracts the context values from a carrier and returns a new context. This new context is created from the old context along with the extracted values. For our Go SDK implementation, the extract()
function accepted two parameters: the context and the textMapCarrier interface.
The final function in our propagator was the fields()
function. This function refers to the predefined propagation fields. If the carrier is reused, the fields should be deleted before calling inject()
. For example, if the carrier is a single-use or immutable request object, we don’t need to clear fields because they couldn’t have been set before. If it is a mutable returnable object, successive calls should clear these fields first. This will return a list of fields that will be used by the TextMapPropagator
.
The source code for the propagator component is shown below:
package xray
import (
"context"
"errors"
"strings"
"go.opentelemetry.io/otel/propagation"
"go.opentelemetry.io/otel/trace"
)
const (
traceHeaderKey = "X-Amzn-Trace-Id"
traceHeaderDelimiter = ";"
kvDelimiter = "="
traceIDKey = "Root"
sampleFlagKey = "Sampled"
parentIDKey = "Parent"
traceIDVersion = "1"
traceIDDelimiter = "-"
isSampled = "1"
notSampled = "0"
traceFlagNone = 0x0
traceFlagSampled = 0x1 << 0
traceIDLength = 35
traceIDDelimitterIndex1 = 1
traceIDDelimitterIndex2 = 10
traceIDFirstPartLength = 8
sampledFlagLength = 1
)
var (
empty = trace.SpanContext{}
errInvalidTraceHeader = errors.New("invalid X-Amzn-Trace-Id header value, should contain 3 different part separated by ;")
errMalformedTraceID = errors.New("cannot decode trace ID from header")
errLengthTraceIDHeader = errors.New("incorrect length of X-Ray trace ID found, 35 character length expected")
errInvalidTraceIDVersion = errors.New("invalid X-Ray trace ID header found, does not have valid trace ID version")
errInvalidSpanIDLength = errors.New("invalid span ID length, must be 16")
)
// Propagator serializes Span Context to/from AWS X-Ray headers.
// Example AWS X-Ray format:
// X-Amzn-Trace-Id: Root={traceId};Parent={parentId};Sampled={samplingFlag}
type Propagator struct{}
// Asserts that the propagator implements the otel.TextMapPropagator interface at compile time.
var _ propagation.TextMapPropagator = &Propagator{}
// Inject injects a context to the carrier following AWS X-Ray format.
func (xray Propagator) Inject(ctx context.Context, carrier propagation.TextMapCarrier) {
sc := trace.SpanFromContext(ctx).SpanContext()
if !sc.TraceID.IsValid() || !sc.SpanID.IsValid() {
return
}
otTraceID := sc.TraceID.String()
xrayTraceID := traceIDVersion + traceIDDelimiter + otTraceID[0:traceIDFirstPartLength] +
traceIDDelimiter + otTraceID[traceIDFirstPartLength:]
parentID := sc.SpanID
samplingFlag := notSampled
if sc.TraceFlags == traceFlagSampled {
samplingFlag = isSampled
}
headers := []string{traceIDKey, kvDelimiter, xrayTraceID, traceHeaderDelimiter, parentIDKey,
kvDelimiter, parentID.String(), traceHeaderDelimiter, sampleFlagKey, kvDelimiter, samplingFlag}
carrier.Set(traceHeaderKey, strings.Join(headers, ""))
}
// Extract gets a context from the carrier if it contains AWS X-Ray headers.
func (xray Propagator) Extract(ctx context.Context, carrier propagation.TextMapCarrier) context.Context {
// extract tracing information
if header := carrier.Get(traceHeaderKey); header != "" {
sc, err := extract(header)
if err == nil && sc.IsValid() {
return trace.ContextWithRemoteSpanContext(ctx, sc)
}
}
return ctx
}
// extract extracts Span Context from context.
func extract(headerVal string) (trace.SpanContext, error) {
var (
sc = trace.SpanContext{}
err error
delimiterIndex int
part string
)
pos := 0
for pos < len(headerVal) { delimiterIndex = indexOf(headerVal, traceHeaderDelimiter, pos) if delimiterIndex >= 0 {
part = headerVal[pos:delimiterIndex]
pos = delimiterIndex + 1
} else {
//last part
part = strings.TrimSpace(headerVal[pos:])
pos = len(headerVal)
}
equalsIndex := strings.Index(part, kvDelimiter)
if equalsIndex < 0 { return empty, errInvalidTraceHeader } value := part[equalsIndex+1:] if strings.HasPrefix(part, traceIDKey) { sc.TraceID, err = parseTraceID(value) if err != nil { return empty, err } } else if strings.HasPrefix(part, parentIDKey) { //extract parentId sc.SpanID, err = trace.SpanIDFromHex(value) if err != nil { return empty, errInvalidSpanIDLength } } else if strings.HasPrefix(part, sampleFlagKey) { //extract traceflag sc.TraceFlags = parseTraceFlag(value) } } return sc, nil } // indexOf returns position of the first occurrence of a substr in str starting at pos index. func indexOf(str string, substr string, pos int) int { index := strings.Index(str[pos:], substr) if index > -1 {
index += pos
}
return index
}
// parseTraceID returns trace ID if valid else return invalid trace ID.
func parseTraceID(xrayTraceID string) (trace.TraceID, error) {
if len(xrayTraceID) != traceIDLength {
return empty.TraceID, errLengthTraceIDHeader
}
if !strings.HasPrefix(xrayTraceID, traceIDVersion) {
return empty.TraceID, errInvalidTraceIDVersion
}
if xrayTraceID[traceIDDelimitterIndex1:traceIDDelimitterIndex1+1] != traceIDDelimiter ||
xrayTraceID[traceIDDelimitterIndex2:traceIDDelimitterIndex2+1] != traceIDDelimiter {
return empty.TraceID, errMalformedTraceID
}
epochPart := xrayTraceID[traceIDDelimitterIndex1+1 : traceIDDelimitterIndex2]
uniquePart := xrayTraceID[traceIDDelimitterIndex2+1 : traceIDLength]
result := epochPart + uniquePart
return trace.TraceIDFromHex(result)
}
// parseTraceFlag returns a parsed trace flag.
func parseTraceFlag(xraySampledFlag string) byte {
if len(xraySampledFlag) == sampledFlagLength && xraySampledFlag != isSampled {
return traceFlagNone
}
return trace.FlagsSampled
}
// Fields returns list of fields used by HTTPTextFormat.
func (xray Propagator) Fields() []string {
return []string{traceHeaderKey}
}
Amazon ECS resource detector
The third component we implemented was the Amazon ECS resource detector. Amazon ECS is a fully managed container orchestration service that makes it easy to run, stop, and manage containers on a cluster. A container defined in a task definition can be used to run individual tasks or tasks within a service. A service is a configuration that enables running and maintaining a specified number of tasks simultaneously in a cluster. Tasks and services can be run on a serverless infrastructure that is managed by AWS Fargate. The ECS resource detector is responsible for detecting whether or not an application or service that’s producing telemetry data is running on ECS.
The ECS resource detector determines whether the application generating telemetry data is running on Amazon ECS. It will then populate ECS-specific attributes in the resource
object, such as the containerID
and hostName
. This comes in handy when trying to troubleshooting a failed request by pinpointing exactly which container was the root case.
For our high-level design of the ECS resource detector, we had a struct containing three functions to help detect whether the application was running inside an Amazon ECS environment.
In the following code block, we show an example usage of the Amazon ECS resource detector:
import (
"context"
"go.opentelemetry.io/contrib/detectors/ecs"
)
func main() {
// Instantiate a new ECS resource detector
detectorUtils := new(ecsDetectorUtils)
ecsResourceDetector := ResourceDetector{detectorUtils}
// If on ECS, resourceObj contains the containerID and hostName
// If NOT on ECS, resourceObj is empty
resourceObj, err := ecsResourceDetector.Detect(context.Background())
}
The full source code for the ECS detector can be found on GitHub.
Amazon EKS resource detector
The last component we implemented was the EKS resource detector. Amazon EKS is a fully managed Kubernetes service that allows you to run and deploy Kubernetes on AWS. Kubernetes is open source and allows organizations to deploy and manage containerized applications, such as platforms as a service (PaaS), batch processing workers, and microservices in the cloud at scale. Through Amazon EKS, organizations using AWS can get the full functions of Kubernetes without having to install or manage Kubernetes itself.
What the EKS resource detector will do is detect whether or not the application that’s generating telemetry data is running on EKS and then populate EKS-specific attributes inside the resource
object. These attributes include the containerID
and clusterName
. This comes in handy when trying to troubleshooting a failed request by pinpointing exactly which container was the root case.
Note: For more details about detecting information for Elastic Kubernetes Service plugin, refer to the AWS X-Ray developer guide (PDF).
For our high-level design for the EKS resource detector, we had a struct containing member variables related to Kubernetes and authentication along with functions to help detect and create the resource
object for supporting EKS detection with OpenTelemetry.
The following diagram shows the flow of function executions for the EKS resource detector. In our implementation, we first invoke the detect()
function, which checks whether the application is running on EKS. This is done by calling the isEks()
function, which uses helper functions, including isK8s()
and getK8sCredHeader()
. Once the helper functions return, if isEks()
is true—meaning the application is running on EKS—then it will call getClusterName()
and getContainerId()
to retrieve the attributes for the resource
object. If isEks()
returns false
—meaning the application is not running EKS—then an empty resource
object will be returned.
In the following code block, we show an example of using the EKS resource detector to extract EKS specific attributes:
import (
"context"
"go.opentelemetry.io/contrib/detectors/eks"
)
func main() {
// Instantiate a new EKS Resource detector
detectorUtils := new(ecsDetectorUtils)
eksResourceDetector := ResourceDetector{detectorUtils}
// If on EKS, resourceObj contains the containerID and clusterName
// If NOT on EKS, resourceObj is empty
resourceObj, err := eksResourceDetector.Detect(context.Background())
}
The full source code for the EKS detector can be found on GitHub.
Testing strategy
The testing process that we used was test driven development (TDD). This is because TDD ensures that there are fewer bugs and has better code coverage. In TDD, code is only written after a test case is written, which ensures that the code being written is covered by a unit test. The process of TDD is shown below.
Testing tools we used
We used two different tools for testing: the Go testing
package and the Go testify
library. We chose to use the testing
package because this came by default with Go, which made writing and executing unit tests easy. We chose to use the testify
library because this library added extra functionality that we could utilize in our unit tests, such as performing assert statements and mocking functions. In the next sections we will provide a brief overview of each of these tools.
Go testing
package
This package is the standard unit test package for Go. It provides support for automated testing of Go packages and uses the go test
command to run the tests. To use the package, we include it as an import in our test file (import "testing"
).
To write a new test suite, we followed these steps:
- Create a new file whose name ends with
_test.go
. - Put the test file in the same package as the code.
- Write unit test functions inside our newly create test file.
- Execute the unit tests by using the following commands:
// Run only one specific unit test
$ go test -run
// Run entire unit test file
$ go test
To write a unit test, we used the following format:
func TestXxx(*testing.T){
// Unit test code goes here
}
- The function name begins with the word
Test
. - The rest of the function name describes the unit test and uses camel-case starting with an uppercase letter.
- The function must take an argument of
*testing.T
.
To learn more about using the Go testing
package, visit the official package docs.
Go testify
library
The testify
library can be used in conjunction with the testing
package in order to perform assertions and mocking.
An assertion is a Boolean expression at a specific point in a program, which will be true unless there is a bug in the program. A test assertion is defined as an expression, which encapsulates some testable logic specified about a target under test.
Benefits of assertions:
- Detect subtle errors that might go unnoticed.
- Detect errors sooner after they occur.
- Make a statement about the effects of the code that is guaranteed to be true.
Here, in the following code, we show an example of using the testify
library to perform an assert statement:
func TestSum(*testing.T){
expectedSum := 8
actualSum = computeSum(5, 3)
assert.Equal(t, expectedSum, actualSum, "sum is incorrect")
}
Mocking with the testify
library
Mock testing is an approach to unit testing that lets us test a specific service or component in isolation to other services and APIs. A typical example of where to use mock testing is if we are trying to unit test a function that makes an HTTP request, we wouldn’t want to make an HTTP request each time the test runs. Rather, we will mock the HTTP request and return a value that we expect. The benefit of this approach is that we are not dependent on testing whether the HTTP request works; rather, we can now test our function with the assumption that the HTTP request is successful.
To mock a function using the testify
library, the following steps must be executed:
- In the main code file, create an
interface
that includes the list of functions to mock. - In the main code file, create a
struct
that will implement theinterface
created in step 1. - In the main code file, implement the
interface
by creating the functions defined in theinterface
. - In the unit test file, create a mock
struct
. - In the unit test file, create mock functions.
- In the unit test file, call the mock functions in the unit tests.
Example:
my_service.go
// step 1 interface
type Database interface {
connect() error
sendMessage(*string) error
}
// step 2 struct
type SQLDatabase struct {}
// step 3 implementing interface functions
func (db SQLDatabase) connect() error {
// function implementation goes here
}
// step 3 implementing interface functions
func (db SQLDatabase) sendMessage(*string) error {
// function implementation goes here
}
my_service_test.go
import "github.com/stretchr/testify/mock"
// Step 4 create a mock struct
type MockDatabase interface {
mock.Mock
}
// Step 5 create mock functions
func (mockDb *MockDatabase) connect() error {
// mock function implementation goes here
}
// Step 5 create mock functions
func (mockDb *MockDatabase) sendMessage(*string) error {
// mock function implementation goes here
}
// Step 6 use mock functions in unit test
func TestConnect(t *testing.T) {
mockDb := new(MockDatabase)
// When calling connect(), it will return nil
mockDb.On("connect").Return(nil)
// Rest of unit test...
}
// Step 6 use mock functions in unit test
func TestSendMessage(t *testing.T) {
mockDb := new(MockDatabase)
// When calling sendMessage("test message"), it will return nil
mockDb.On("sendMessage", "test message").Return(nil)
// Rest of unit test...
}
Conclusion
While working on this project, we learned a lot about the process of developing and delivering high-quality open source code. Because OpenTelemetry is an open source project, we learned how to work with a large community of contributors, including developers from several organizations. We were able to learn and embrace the open source principles of transparency and discussion. Throughout this project, the key success factors included documenting the requirements, design, and test plan, along with joining SIG meetings and discussions to stay active.
Working with the OpenTelemetry community has been an amazing experience, and we invite you to join us. To learn more, check out the OpenTelemetry membership page. Join the discussions, test new features, and contribute your ideas and experience. We hope to see you there!
References
- OpenTelemetry project website
- OpenTelemetry GitHub repository
- OpenTelemetry specifications
- AWS Distro for OpenTelemetry website
- Getting started with AWS X-Ray
- Getting started with Amazon Elastic Container Service (ECS)
- Getting started with Amazon Kubernetes Service (EKS)
- Getting started with Go
- Go
testing
package - Go
testify
library
The content and opinions in this post are those of the third-party author and AWS is not responsible for the content or accuracy of this post.