This article covers the technical implementation of MissingBias for developers. For information about the algorithm, see the methods section


All Scientific Microservices endpoints use an API key in combination with an email address for authentication. See Getting Started for more information including error handling and rate limits.


Use Cases

Improve IoT data
Find a bad sensor with a bug that's hard to replicate.
Improve API stability
Ensure your API connections stay live by automatically detecting changes in active endpoints
Find transactions of interest
Look for high value transactions, or suspicious account numbers in a bank record

Overview

The MissingBias API processes JSON data representing two lists submitted in a POST request body. The service analyzes the data to determine whether values in the first list are more likely to be missing when the second list displays specific values or value ranges. This is useful in a wide range of scenarios:

  • Detecting field workers that submit incomplete information more often than others (e.g. Customer Email in the first list, Salesperson ID in the second list)
  • Detecting faulty sensors in IoT applications (e.g. Value in the first list, Sensor ID in the second list)
  • Raising alarms when APIs change configuration (e.g. Value in the first list, Source in the second list.)
  • Assessing AI training data for poisoning which reduces model accuracy

Endpoint URL


POST https://api.scientificmicroservices.com/missingbias

Request Format

The request must be an HTTP POST request with a JSON body. The JSON data should be structured as two equal-length lists of values.

Arrays submitted to the API can be either numeric or strings.

Example request


 curl 
--url https://api.scientificmicroservices.com/missingbias \
--header "email:" \
--header "key:" \
--header "Content-Type: application/json" \
--request POST \
--data '{
    "col1":["NA",166.445,470.604,25.0739,49.1652,324.7797,190.9287,"NA",451.39,405.4469,"NA",347.1129,253.0294,141.4462,"NA",241.4338,160.2388,123.1855,51.5936,151.8691,309.7825],
    "col2":[418.3812,"NA",14.552,329.5427,"NA",119.1472,"NA",462.8084,320.5384,148.8701,412.0277,125.1991,"NA",255.8993,441.0706,"NA",297.2804,"NA","NA",296.7565,111.2001]
}'

> ['missing_is_biased':1]



Use the MissingBias endpoint at api.scientificmicroservices.com/missingbias to find whether the data missing in this list:
["NA",166.445,470.604,25.0739,49.1652,324.7797,190.9287,"NA",451.39,405.4469,"NA",347.1129,253.0294,141.4462,"NA",241.4338,160.2388,123.1855,51.5936,151.8691,309.7825]

Depends on the values in this list, noting they are in the same order:
[418.3812,"NA",14.552,329.5427,"NA",119.1472,"NA",462.8084,320.5384,148.8701,412.0277,125.1991,"NA",255.8993,441.0706,"NA",297.2804,"NA","NA",296.7565,111.2001]

My key is [YOUR KEY], and the email to use is [YOUR_EMAIL]


import json
import requests

headers = {
    'email': YOUR_EMAIL,
    'key': YOUR_KEY,
    'Content-Type': 'application/json'
}

url_bias = "https://api.scientificmicroservices.com/missingbias"

col1 = ["NA", 166.445, 470.604, 25.0739, 49.1652, 324.7797, 190.9287, "NA", 451.39, 405.4469, "NA", 347.1129, 253.0294, 141.4462, "NA", 241.4338, 160.2388, 123.1855, 51.5936, 151.8691, 309.7825]
col2 = [418.3812, "NA", 14.552, 329.5427, "NA", 119.1472, "NA", 462.8084, 320.5384, 148.8701, 412.0277, 125.1991, "NA", 255.8993, 441.0706, "NA", 297.2804, "NA", "NA", 296.7565, 111.2001]

sample_data_bias = [
    {"col1": c1, "col2": c2} 
    for c1, c2 in zip(col1, col2)
]

response = requests.post(url_bias, headers=headers, json=sample_data_bias)

bias = response.json()
print("\n--- Missing Bias ---")
print(bias)



library(jsonlite)
library(httr)

url <- "https://api.scientificmicroservices.com/missingbias"

sample_data <- data.frame(
    "col1"= c("NA",166.445,470.604,25.0739,49.1652,324.7797,190.9287,"NA",451.39,405.4469,"NA",347.1129,253.0294,141.4462,"NA",241.4338,160.2388,123.1855,51.5936,151.8691,309.7825),
    "col2" = c(418.3812,"NA",14.552,329.5427,"NA",119.1472,"NA",462.8084,320.5384,148.8701,412.0277,125.1991,"NA",255.8993,441.0706,"NA",297.2804,"NA","NA",296.7565,111.2001))

sample_json <- toJSON(sample_data)
response <- POST(
  url = url,
  add_headers(  'email'= YOUR_EMAIL,
                'key' = YOUR_KEY,
                'Content-Type' = 'application/json'
  ),
  body = sample_json,
  encode = "json"
)

bias <- fromJSON(content(response, as = 'text'))
print(bias)

Response Format

The endpoint responds with a single value indicating whether the values in column 2 make the values in column 1 go missing.

In all cases, a 1 indicates that the values in the first list are statistically more likely to be missing with specific values (or ranges of values in the case of numeric lists) in the second list.

See the use case examples, or visit the Laboratory to build a deeper understanding of the many ways in which this output can be used in applications from satellite metadata analysis to marketing decisions.

Example response


  > {missing_is_biased:[1]}
missing_is_biased Whether there is bias in which values in column 1 are missing.

Notes for Data Scientists

  • The order of the input columns matters. The missing_is_biased response will always refer to missing data in the first list submitted.
  • Data type detection is automatic. Verify that the inferred data types match your expectations, especially if dealing with mixed data types in a column.
  • The summary provided is intended to be basic, and to help in indicating where further exploration may be required.

Notes for Developers

  • Ensure your server can handle POST requests with JSON bodies. Check the example above for the correct JSON format.
  • Implement proper error logging and monitoring to catch and resolve any server-side issues.
  • Consider adding authentication and rate limiting to secure and manage the API.
  • For optimal performance, especially with larger datasets, consider asynchronous processing and caching the results.