Details and Features of Data Streaming

Data Delivery

Data Streaming uses different technologies to export data, and the delivery reliability can vary depending on the format and destination chosen.

  • Export via OpenTelemetry (OTLP) When exporting data in OTLP format to destinations such as Dynatrace, Datadog, or AWS S3, inherent completeness limitations from the OpenTelemetry model may occur. This format is better suited for observability scenarios (such as performance monitoring and event tracing), where analyses based on samples are usually sufficient to identify failures and anomalies. It is not suitable for business data that requires integral precision.

  • Export in JSON format Exporting data in JSON format generally offers a more reliable delivery rate.
    This option is recommended for situations where data integrity has a higher priority, but even so, we do not recommend it for use in critical decision-making systems that depend on 100% of the data for correct operation.

    The JSON file sent is compressed in gz format.

Observability vs. Data Analysis

Understand the difference between the two concepts that, although related, have distinct goals:

  • Observability: tracks system behavior and health in real time. It uses telemetry data, such as metrics, logs, and traces, and performs analyses through intelligent sampling, which can reduce the data volume when everything is operating normally. The pipeline is optimized for speed and efficiency, prioritizing rapid problem detection.

  • Data Analysis (Analytics): focuses on understanding the business and user behavior. It uses historical data for detailed analyses, report generation, and strategic decision-making. The pipeline prioritizes comprehensiveness.

  • Data Streaming and Sensedia Analytics process data through different pipelines, which can impact the volume and granularity of the available information.

  • Temporary interruptions in networks, messaging systems, or service providers can create gaps in observability, reinforcing the difference between observational and analytical data.

Trace Data and Dimensions

The traces sent in OTLP format are generated on the Sensedia gateway itself. This means they are separated from any distributed trace that your client application may have generated. Currently, we only send the trace dimension.

Examples

See examples of what the file generated and sent to different destinations looks like.

  • OTLP Example

    See an example of a file in OTLP format, sent to AWS S3:

    {
      "resourceSpans": [
        {
          "resource": {
            "attributes": [
              { "key": "service.name", "value": { "stringValue": "checkout-service" } },
              { "key": "service.version", "value": { "stringValue": "1.3.2" } },
              { "key": "host.name", "value": { "stringValue": "ip-10-0-0-15" } }
            ]
          },
          "scopeSpans": [
            {
              "scope": {
                "name": "io.opentelemetry.contrib.mongodb",
                "version": "0.39.0"
              },
              "spans": [
                {
                  "traceId": "d4cda95b652f4a1592b449d5929fda1b",
                  "spanId": "6e0c63257de34c92",
                  "parentSpanId": "1111111111111111",
                  "name": "MongoDB INSERT orders",
                  "kind": "SPAN_KIND_CLIENT",
                  "startTimeUnixNano": "1693666548745123456",
                  "endTimeUnixNano": "1693666548756789012",
                  "attributes": [
                    { "key": "db.system", "value": { "stringValue": "mongodb" } },
                    { "key": "db.name", "value": { "stringValue": "ecommerce" } },
                    { "key": "net.peer.name", "value": { "stringValue": "mongo-primary" } }
                  ],
                  "status": {
                    "code": "STATUS_CODE_OK"
                  }
                }
              ]
            }
          ]
        }
      ]
    }
  • JSON Example

    See an example of a file exported in JSON format for S3, Azure Blob, or GCS:

    {
    "apiName": "Test API",
    "resourceId": 280767,
    "completeUrl": "GET https://api-sample.sensedia-eng.com/sample/test
    ",
    "appDeveloper": "test",
    "apiComponentType": "OPERATION",
    "billing": true,
    "trace": "[{"timeMillis":20,"message":"Choosing route between 316 possible alternatives", ... }]",
    "environmentName": "Staging",
    "operationName": "Test API GET /sample",
    "resultStatus": 200,
    "requestHeaders": "host: randomapi.sensedia.com\nuser-agent: Mozilla/5.0 ...",
    "responseHeaders": "date: Mon, 13 May 2024 13:58:54 GMT\ncontent-type: application/json",
    "billingData": {
    "accessTokenBalance": "4",
    "appBalance": "2",
    "accessTokenBillingQuota": "3",
    "appBillingQuota": "1",
    "billingValue": "123"
    },
    "receivedOnDate": "2024/05/13 13:58:53 +0000"
    }

The example above has been reduced for easier reading.
In real files, the fields trace, requestHeaders, and responseHeaders may contain more extensive information.
If you prefer, you can download a complete example file in JSON format here:

+ * *Folder Structure

+ The directory structure varies by storage provider. In the case of AWS S3 (JSON), Azure, and Google Cloud, the organization follows the pattern:

+

s3://<bucket-name>/<environment>/<year>/<month>/<day>/<hour>/<file>.json

Example:

/analytics/{internal_identifier}/year=2024/month=05/day=13/hour=19/{file_name}.json

{internal_identifier} = Sensedia control point.

+ example in AWS S3

Thanks for your feedback!
EDIT

Share your suggestions with us!
Click here and then [+ Submit idea]