This article covers the technical implementation of MissingBias for developers. For information about the algorithm, see the methods section
All Scientific Microservices endpoints use an API key in combination with an email address for authentication. See Getting Started for more information including error handling and rate limits.
Use Cases
Overview
The MissingBias API processes JSON data representing two lists submitted in a POST request body. The service analyzes the data to determine whether values in the first list are more likely to be missing when the second list displays specific values or value ranges. This is useful in a wide range of scenarios:
- Detecting field workers that submit incomplete information more often than others (e.g. Customer Email in the first list, Salesperson ID in the second list)
- Detecting faulty sensors in IoT applications (e.g. Value in the first list, Sensor ID in the second list)
- Raising alarms when APIs change configuration (e.g. Value in the first list, Source in the second list.)
- Assessing AI training data for poisoning which reduces model accuracy
Endpoint URL
POST https://api.scientificmicroservices.com/missingbias
Request Format
The request must be an HTTP POST request with a JSON body. The JSON data should be structured as two equal-length lists of values.
Arrays submitted to the API can be either numeric or strings.
Example request
Response Format
The endpoint responds with a single value indicating whether the values in column 2 make the values in column 1 go missing.
In all cases, a 1 indicates that the values in the first list are statistically more likely to be missing with specific values (or ranges of values in the case of numeric lists) in the second list.
See the use case examples, or visit the Laboratory to build a deeper understanding of the many ways in which this output can be used in applications from satellite metadata analysis to marketing decisions.
Example response
> {missing_is_biased:[1]}
| missing_is_biased | Whether there is bias in which values in column 1 are missing. |
Notes for Data Scientists
- The order of the input columns matters. The missing_is_biased response will always refer to missing data in the first list submitted.
- Data type detection is automatic. Verify that the inferred data types match your expectations, especially if dealing with mixed data types in a column.
- The summary provided is intended to be basic, and to help in indicating where further exploration may be required.
Notes for Developers
- Ensure your server can handle POST requests with JSON bodies. Check the example above for the correct JSON format.
- Implement proper error logging and monitoring to catch and resolve any server-side issues.
- Consider adding authentication and rate limiting to secure and manage the API.
- For optimal performance, especially with larger datasets, consider asynchronous processing and caching the results.