HIGHFLEET’s Semantic Federation and Analyses Case Study

 Download as PDF

NOTE: This Case Study is an analog of a similar client problem whose description is not permitted according to the terms of our Non-Disclosure Agreement with the client. Nevertheless, this analog closely resembles the actual client problem.

Importantly, the Approach and Benefits are matched to what we actually achieved. The Metrics for database federation are actual metrics. The amount of information semantically federated in such a short time is not by any means some sort of limit. Larger federations are done as well. This Solution is applicable to any database federation problem while providing the additional benefit of powerful analytics “baked in” to the solution. And very importantly, we apply our capabilities to Data Cleansing.

Michael Davis
President/CEO
HIGHFLEET, Inc.
davis@highfleet.com
m 410.507.1339


Problem Statement and Précis of HIGHFLEET Benefits

The Problem – Diverted (Stolen) High Value Machine Parts

Our client, a precision machine parts manufacturer, the Firm, is having a problem with product diversion. These machined parts are expensive, difficult to manufacture and crucial to industries such as aviation, ship building, defense systems and the auto industry to name a few. Diversion means a product is taken from its intended market or end users and sold somewhere else. Diversion is theft. The diverted parts are sold to end users who do not have contracts with the manufacturer and do not have support agreements and so this puts lives at risk. The diverted parts are typically repackaged and sold at reduced prices. And, amazingly, the parts are usually offered on-line. Shutting down a site is not effective, nor does it reveal the mechanisms and sources of Diversion.

Prior to HIGHFLEET’s work, the Firm had no effective means of understanding their Diversion problem, and thus acting on it.

Let’s look at the Diversion problem.

The manufacturer makes these items at different plants around the globe. The same kind of part might be made in several different factories. Many factories were acquired in Merger activity. There are details in the parts, such as etched code number, tooling marks, finishes, and minute differences in metallic composition, for example, that can be attributed to a particular manufacturing site. Packaging details also provide indications of where the part was manufactured. But, Diverted parts are usually repackaged.

Investigators, having found suspected diverted parts, would make a “buy” of parts and then analyze them in a lab in an effort to determine from which plant they originated to gain insight into the Diversion operation. For example:

  • At what point did the part “go off” the normal distribution chain?
  • This gives insight into whether there is a security issue at the plant, with the shipper, with the warehouse, or some other point in marketing and distribution.

Unfortunately, the manufacturer’s supply chain data and the investigators’ records describe the items in completely different terms. So it was not possible to easily compare the manufacturer’s data with the investigators’ data. This forced the Firm to send the parts to a lab that would use microscopic examination and metallurgical composition analysis. This entire process incurs several expenses:

  • The expense of making the buy.
  • The expense of hiring the Investigators.
  • The expense of sophisticated laboratory analysis.
  • Costs and Risks from not solving the Diversion Problem
  • Loss of revenue from diverted products.
  • Risk of unsupported and improperly installed products to human life.

What HIGHFLEET did for the manufacturer was:

  • Eliminate the need for “buys” of suspected diverted machine parts.
  • Eliminate the need for most investigators.
  • Eliminate the need for expensive lab testing.

Even more importantly:

  • We showed the manufacturer where their diversion risks where. As it turns out, our analysis showed the manufacturer that their diversion risk was NOT where they thought.
  • We showed the manufacturer which products were most active as diverted products. Again, thanks to HIGHFLEET’s system, we showed that the part the manufacturer thought was the highest diversion risk was NOT, but other parts are.
  • This gave the manufacturer the information needed to stop the problems, stopping revenue losses and risks.


HIGHFLEET Solution - Semantic Federation of Legacy Databases (SemFed) with our Reasoner-Provided Analysis and Sense Making



How – From Model to Data Federation to Analyses

Our approach: Creating the Diversion Model.

HIGHFLEET’s Ontologists built a rich, high-fidelity Ontology/Logic model of the Diversion Problem. We used our Integrated Ontology Development Environment (IODE) and our team of Ontologists to build the Logic Model. The IODE can support large Ontologies and data sets for testing. Henceforth we will use Logic Model for Ontology.

Our Diversion Logic Model had four types of sources.

  1. Subject Matter Experts’ pre-existing ideas, e.g. “A part which is sold online at too low of a price is likely diverted.” This and other expert knowledge was captured in the IODE, forming part of the Diversion Logic Model.
  2. Our logic and modeling discipline, e.g. What is a “part”? Is it a type of thing, like “M6-1.0 x 16mm Phillips Pan Head Metric Screw” or a particular object in the world like “the M6-1.0 x 16mm Phillips Pan Head Metric Screw I bought from Home Depot in a package of 10 yesterday”? Although it seems obvious that these kinds of distinctions would be essential in a Diversion analysis—how else can you distinguish a diverted part from the legal part? Such necessary distinctions were not made in any of the data sources.
  3. Our analysis of the customer’s data. E.g. “Part prices vary with time, country of sale, and exchange rates.”
  4. Pre-existing models that we have here at HIGHFLEET. E.g. geographic model, event models, etc.

We encoded the information from each of these sources in machine-understandable logic using the IODE, creating a single Diversion Model. During this process we discovered that to fully understand the domain and the data, we needed to model not only the availability of parts online (the client’s original focus) but also the company’s supply chain practices and the social networking relationships between suspected diverters.

Our approach: Supporting the Diversion Model with data.

Now we had a Diversion Model on which our Deductive Database the eXtensible Knowledge Server (XKS) could perform analysis by doing inference over the Model and appended data. Our First Order Logic-based Reasoner in the XKS provides powerful analytics.

But we had to federate several sources of data to support the analysis our XKS would do.

We used HIGHFLEET’s Semantic Federation of Legacy Databases solution, SemFed, to semantically federate information from numerous repositories, some native to the Firm and others acquired in Mergers and Acquisitions.

HIGHFLEET uses our sophisticated tools to partially automate development of the Logic Model from the databases and repositories to be semantically federated. Our approach avoids the “many-to-many” mapping problems of other semantic approaches and of traditional methods. And all finished SemFed solutions benefit from the analytics and sense making that are provided by our First Order Logic-based Reasoner.

Importantly, our SemFed solution allows legacy databases to remain in place for local use, while creating a single a semantic federation that to any user acts like a single XKS.

There had been a recent merger, and the company had acquired quite a few products from the merged companies. Data about these “legacy” parts was stored in different formats and different data structures than data about the company’s own original parts. However, our tools and techniques enabled us to unify these different sources in the semantically federated system.

There had been a recent merger, and the company had acquired quite a few products from the merged companies. Data about these “legacy” parts was stored in different formats and different data structures than data about the company’s own original parts. However, our tools and techniques enabled us to unify these different sources in the semantically federated system.

HIGHFLEET Semantic Federation

Our Approach: Data Cleansing

Companies ignore the problem of dirty data to their peril. If you search the web using “cost of bad data” you will be entertained with several cogent articles on the problem. Suffice it to say that the estimated dollar costs are staggering. What is interesting to consider – How many important decisions are companies making based on corrupt data? Many techniques of data federation would not catch bad data. Our approach does. Consequently, you not only get database federation at less cost with powerful analytics “baked in” to our solution, you also get our process for Data Cleansing.

Each bit of data added to the model was checked for logical consistency. This checking is possible because our First-Order Logic Based Reasoner can apply Integrity Constraints that flag values that violate the logic of the model. As a result, we discovered that Firm A’s product and pricing data, upon which the SME’s plan for detecting diversion heavily relied, turned out to be self-contradictory (e.g. different prices for the same part in the same country at the same time) as well as incomplete (no pricing data for some parts which were listed elsewhere). Having discovered this, thanks to HIGHFLEET, the client set about correcting this information so that it could contribute to our analyses.

Data Cleansing: Use of “Mutator”

As is typical in many companies, each data source had been developed in isolation, probably to meet the “one off” needs of some application. And this was true of the Firm; there was no set of standard identifiers for parts. Rather than developing a standardized naming scheme and applying it to all of the incoming sources, or enforcing a standard nomenclature or data model as in Master Data Management and Data Warehousing, we used an intelligent matching algorithm (internal name “Mutator”) to determine when two parts were the same based on their mechanical specifications. This matching algorithm was able to assess the likelihood of two parts having the same type despite incomplete information, based on the presence of matching attributes and the absence of contradictory attributes. This intelligent matching step was crucial to addressing the client’s problem—only when we knew that a part purchased online had the same type as a part manufactured by the company, were we able to further analyze the circumstances of the sale (the price, the location of the seller, etc.) to determine whether the part purchased online was counterfeit.

Further, by using an intelligent matching algorithm instead of a standard data warehousing scheme, we were able to get this analytical result without modifying, moving, or losing up-to-the-minute access to any of the client’s in place data.

Operational Benefits

Analysis

- Discovery of which products are diverted the most.

- Actionable intelligence for Law Enforcement.

The client required two major pieces of analysis 1. Which products are diverted the most?, and 2. Actionable intelligence for Law Enforcement – the Who, What, Where and When.

Thanks to HIGHFLEET’s semantic federation and our Reasoner, analytic query addressed to our system revealed which products where the most diverted. Surprisingly, it was not what the client thought. Consequently, we prevented the client from devoting resources to an issue that was not a problem. HIGHFLEET’s analysis also provided sufficient detail to pursue legal action against Diverters.

Having learned what the HIGHFLEET system could do, our client requested additional analysis. For example: —where were the diverted products turning up? Our Ontology/Logic Model encompassed Subject Matter Expertise on aspects of sales of diverted products. Based on this expertise as modeled in our system, we were able to apply geospatial reasoning over the client’s supply chain network to pinpoint the most vulnerable points in their supply chain. The number one offender (a particular manufacturing plant) had been nowhere on their radar prior to our analysis.

Analysis – Inherent Agility

HIGHFLEET’s solution allows easy extension of the Logic Model, and because the Logic Model/Ontology reflects the real world, new, unanticipated analytic query is simple to incorporate. This new query is usually the addition of a line or two of code, unlike SQL, which might require pages of complex, error prone code. Consequently, the Firm continues to request new kinds of analysis from the federated system. For example, most recently we used our social networking model to assess, verify and expand upon the results of their human investigators. Currently we are exploring ways to determining which geographic markets are most vulnerable to safety risks due to diverted parts, in order to focus expensive post-sale mitigation measures (like inspecting airplanes) in those areas where they will have the most impact. So, what started out as pricing analysis grew into geographic proximity analysis grew into social networking analysis, and continues to grow.

Operational Costs Savings

- Analysis is completed much faster.

- The Firm can sever contracts with vendors using manual methods.

It’s an old saw but “Time is money.” Diversion analysis used to take weeks. Now it can be done in minutes, vastly reducing the cost of performing vital analysis. leaving aside that analysis done prior to HIGHFLEET’s solution was based on faulty data. The system greatly reduced the cost of performing analysis. The HIGHFLEET solution also largely obviated the need for human investigators. By building a comprehensive model linked to data from many sources and exposed to analysis via our Reasoner, our SemFed solution automated what had once been a slow, manual process.

Implementation Cost Savings

- Implementation Metrics.

- Databases remain in place for local use.

- No need for expensive Business Process Re-Engineering.

How many sources federated and how long did it take?

Most database federation projects grow rapidly in cost with each new data source added to the federation. HIGHFLEET’s SemFed keeps the cost curve flat, and we perform federation very rapidly. In total, HIGHFLEET federated 75 very large spreadsheets and 15 extensive databases. The databases and spreadsheets involved about 45GB of raw data, with 47 tables each with a primary key, 43 foreign keys and 6 unique indexes and a total of 800 million rows. This is in addition to capturing knowledge from several subject matter experts and instantiating it in our Logic Model. This was done by only 2 employees working for 3 months. This work also included analysis and interface development.

“Changing the engines while the plane is in flight.” - HIGHFLEET’s SemFed solution leaves existing databases and resources in place for local use, causing no disruption to daily operations.

All of these measures contribute substantially to cost savings.

Back to Top

Additional Benefits

Our Approach elicited gaps in Firm’s Supply Chain Information.

Because we build a Logic Model of the Diversion Problem, our rigorous approach revealed gaps in the Firm’s own Supply Chain data. Consequently, we made recommendations to the vendors producing data for the Firm regarding their data structures and data entry policies. Many of these were implemented and improved the quality of data and resulting analysis.

HIGHFLEET Semantic Federation

SemFed - This Solution provides flexible, low cost database federation and analytics, or for providing increased analytics for Data Warehouses or even a single legacy database.


XKS - The eXtensible Knowledge Server, our scalable deductive database for new database implementations, or for replacing expensive existing databases while providing powerful analytics. The XKS is scaled and priced to fit your needs.


HIGHFLEET Services for:
SemFed implementation

XKS installation
Application Development
Ontology/Logic Model creation
Interface Development
Outsourced Analysis

Back to Top

-->