Edge Analytics – What, Why, When, Who, Where, How

Edge Analytics – What, Why, When, Who, Where, How

As I have written extensively before, the primary purpose of any data you collect or manage is to derive actionable insights from that data using various types of data analytics. A casual browsing on data analytics will tell you that there are 4 types of data analytics and they are: Descriptive analytics, Diagnostic analytics, Predictive analytics, and Prescriptive analytics. Descriptive analytics focuses on what happened, diagnostic analytics relays why it happened, predictive analytics previews what is likely to happen and prescriptive analytics conveys options on  what you should do about it. But you’ll be missing out on an exciting area called Edge Analytics if you relied solely on this type of classification.


Let’s look at the scenario of an offshore oil rig which has hundreds of sensors collecting data but miles away from any decent data center to process and analyze this data. What if the sensors had access to decentralized process systems that could perform data analytics and possibly shut off a faulty valve right then and there based on the diagnosis and prediction? Wouldn’t that be more efficient than sending all that sensor data back to central data centers miles away and  relaying back the same information much later? Yes, that’s where edge analytics comes in.


WHAT is Edge Analytics?

Simply put, Edge analytics is the collection, processing, and analysis of data at the edge of a network either at or close to a sensor, a network switch or some other connected device. With the growing popularity of connected devices with the evolution of Internet Of things (IOT), many industries such as retail, manufacturing, transportation, and energy are generating vast amounts of data at the edge of the network.  Edge analytics is data analytics in real-time and in-situ or on site where data collection is happening. Edge analytics could be descriptive or diagnostic or predictive analytics.


WHY Edge Analytics?

Is edge analytics another gimmicky term invented just to make our lives complicated? Not really. Organizations are deploying millions of sensors or other smart connected devices at the edge of their networks at a rapid pace and the operational data that they collect on this massive scale could present a huge problem to manage. Edge analytics offers few key benefits:

  • First benefit is to reduce latency of data analytics. In many environments such as oil rigs, aircraft, CCTV cameras. remote manufacturing environments, there may not be sufficient time to send data to central data analytics environment and wait for the results to meaningfully impact decisions to be taken on site in a timely manner. As mentioned in the oil rig example in the introduction, it may be more efficient to analyse data on the faulty equipment right there and shut off the valve immediately if needed.   
  • Second benefit is scalability of analytics. As the number of sensors and network devices grow, the amount of data that they collect also grows exponentially and it increases the strain on the central data analytics resources to process these huge amounts of data. Edge analytics enables organizations to scale their processing and analytics capabilities by decentralizing to the sites where the data is actually collected.
  • Third benefit is that edge analytics helps get around the problem of low bandwidth environments.The amount of bandwidth needed to transmit all the data collected by thousands of these edge devices will also grow exponentially with the increasing number of these devices. And many of these remote sites may not even have the bandwidth to transmit the data and analysis back and forth. Edge analytics alleviates this problem by delivering analytics capabilities in these remote locations.
  • Lastly, edge analytics will probably reduce overall expenses by minimizing bandwidth, scaling of the operations and reducing the latency of critical decisions.

WHEN should edge analytics be considered?

Even though edge analytics is an exciting area, it should not be viewed as a potential replacement for central data analytics. Both can and will supplement each other in delivering data insights and both models have their place in organizations. One compromise  of edge analytics is that only a subset of data can be processed and analyzed at the edge and only the results may be transmitted over the network back to central offices. This will result in ‘loss’ of raw data that might never be stored or processed. So edge analytics is OK if this ‘data loss’ is acceptable. On the other hand, if the latency of decisions (& analytics) is not acceptable as in flight operations or critical remote manufacturing/energy, edge analytics should be preferred.


WHO are the players in edge analytics?

Apart from the smart sensors and connected devices to collect data, edge analytics requires hardware and software platforms for  storing data, preparing the data, training the algorithms and processing of the algorithms. Most of these capabilities are increasingly being delivered on general purpose server / client and software platforms. Intel, Cisco, IBM, HP, and Dell are some of the leading companies driving edge analytics.


WHERE is edge analytics deployed the most?

Given that edge analytics benefits organizations where data insights are needed at the edge, Retail, Manufacturing, Energy, Smart cities, Transportation and logistics vertical segments are leading the way in deploying edge analytics. Some use cases are: retail customer behavior analysis, remote monitoring and maintenance for energy operations, fraud detection at financial locations (ATMs etc.), and monitoring of manufacturing & logistics equipment.


HOW to deliver edge analytics?

Getting to edge analytics is not an overnight task and it typically involves creating the analytics model, deploying the model and executing the model at the edge. There are decisions that need to be made in each of these areas with respect to collecting data, preparing data, selecting the algorithms, training the algorithms on a continuous basis, deploying/redeploying the models etc. The processing/storage capacity at the edge also plays a key role. Some of the merging deployment models include decentralized and peer-to-peer deployment models with pros and cons for each.




As far as I am concerned, edge analytics is an exciting area with organizations in Industrial Internet Of Things (IIOT) area increasing their investments year over year. Leading vendor companies are aggressively investing into this fast growing area In specific segments such as retail, manufacturing, energy, and logistics, edge analytics delivers quantifiable business benefits by reducing latency of decisions, scaling out analytics resources, solving bandwidth problem and potentially reducing expenses.


About the author:

Ramesh Dontha is Managing Partner and Editor-In-Chief at Digital Transformation Pro, a management consulting and training organization focusing on Big Data, Data Strategy, Data Analytics, Data Governance/Quality and related Data management practices. For more than 15 years, Ramesh has put together successful strategies and implementation plans to meet/exceed business objectives and deliver business value. His personal passion is to demystify the intricacies of data related technologies and latest technology trends and make them applicable to business strategies and objectives. Ramesh can either be reached on LinkedIn or Twitter (@rkdontha1) or via email:  rkdontha AT DigitalTransformationPro.com

Edge Analytics – What, Why, When, Who, Where, How

Women Influencers In Data

Women Influencers In Data

Few months ago, I wrote an article about the influencers in big data. The article resonated with many and almost all appreciated it. But that’s not the point. Soon after that, I read an article about the abysmal percentage of women in technology especially in the higher echelons. One report mentioned it to be anywhere between 11% to 28% depending on the organizational level.
So I posted on LinkedIn specifically calling out only the women influencers from my original article. Within 24 hours, 10,000 people read that post and 20,000 in 2 days. I was blown away. I had messages from all over the world thanking me for inspiring with that post. I could never imagine that a simple post could have that strong a reaction.
But I should have known better. Sometimes, just the awareness that there are other women who have succeeded and doing well itself could be strong inspiration.Growing up, I never realized that I could actually become an engineer until I heard that another person with similar background and from same rural area actually became an engineer. I realized that awareness is key. As a father of 2 daughters, all these points are very personal to me.
So it got me thinking. Why can’t I put a site celebrating the accomplishments of women influencers in data field which can serve as an ongoing inspiration to many people around the world. That’s how this project was born. Yeah, it’s ambitious, so what. I tested the idea with Carla Gentry who I have gotten to know well and a source of inspiration for myself. She liked it and off I went.
I am happy to announce that the site is ready and the link is posted below.To start with, I picked the original 20 women I had from my original article.
Each of them have their own page on the site with their profile information from either LinkedIn or Twitter, links to their social networks like LinkedIn and Twitter. Additionally, the page will also have their latest tweets. So this will be handy if you are unable to keep up with overall twitter as you can just visit their page and look at their recent tweets. For example, Carla Gentry is a prolific tweeter and you can see her latest 50 tweets on her page.
Wait, there is more. To really pick it up a notch, I have prepared a video with their basic info so you can get it all in less than 4 minutes. The link is below.
So there you have it, folks. This has been passion of mine and I loved every minute of it. Hope you’ll like it as well.
I suggest you start off with this short 4 minute video. To watch the video, go here.
To read about them and stay connected or follow, go to the web site here.
Of course, you can read the original article with all the influencers here. Just be aware that the article was written as a song.
All I ask is few favors.
Be inspired. And don’t keep the inspiration only to yourself. Share with your friends and co-workers.
If you like any of these, please show it with a ‘like’. If you don’t like it, just let me know what I can do better via comments or messages to me.
I know there are lot more women influencers out there so please pass on their names in the comments section below. My intent is to keep adding to this list and keep the site up to date. This is just the beginning.

Women Influencers In Data

Data Mining – What, Why, When

Data Mining – What, Why, When

One of the best ways to learn about any topic is start with very fundamental questions like What, Why etc? Good old Socratic method. In this series of articles on data mining, I plan to approach this topic in a similar fashion. 

What is Data Mining?

Simply put, Data mining is the process of sifting through large data sets to identify and describe patterns, discover and establish relationships with an intent to predict future trends based on those patterns and relationships.

Why is data mining relevant now? Haven’t we been ‘mining’ data from time immemorial?

Yes and No. It is true that data was always analyzed to identify patterns and predict outcomes, the data that organizations had to deal with exploded in recent times with the advent of big data. As these large data sets make it almost impossible to identify those multi-dimensional patterns using traditional techniques or tools, data mining in its modern form, with the advent of latest tools and faster processing, automates the discovery of patterns, establishing relationships, and putting together predictive models thus making it efficient. 

What are some of the specific benefits of data mining?

The broad benefit of identifying hidden patterns, consequent relationships and establishing predictive models can be applied to many functions and contexts in organizations.

Specifically, customer-focused functions can mine customer data to acquire new customers, retain customers, cross-sell to existing customers. Other examples are to enhance customer lead conversion rates and/or build future sales prediction models or new products & services. 

Financial sector companies can build fraud-detection models and risk mitigation models. Energy and manufacturing sector can come up with proactive maintenance models and quality detection models. Retailers can build stock placement/replenishment models in stores and assess the effectiveness of promotions and coupons. Pharmaceutical companies can mine large chemical compounds data sets to identify agents for the treatment of diseases.

What skills are needed for data mining?

Data mining sits at the intersection of statistics (analysis of numerical data) and artificial intelligence / machine learning (Software and systems that perceive and learn like humans based on algorithms) and databases. Translating these into technical skills leads to requiring competency in Python, R, and SQL among others. In my opinion, a successful data miner should also have a business context/knowledge and other so called soft skills (team, business acumen, communication etc.) in addition to the above mentioned technical skills.

Why? Remember that data mining is a tool with the sole purpose of achieving a business objective (increase revenues / reduce costs) by accelerating the predictive capabilities. A pure technical skill will not accomplish that objective without some business context. 

A data point is from Meta Brown’s book “Data Mining for dummies” where she states:

“A data miner’s discoveries have value only if a decision maker is willing to act on them. As a data miner, your impact will be only as great as your ability to persuade someone — a client, an executive, a government bureaucrat — of the truth and relevance of the information you have to share. This means you’ve got to learn to tell a good story — not just any story, but one that honestly conveys the facts and their implications in a way that is compelling for your decision maker.”

Hope this gives you an overview of data mining and where it can be applicable.  In the next article, I plan to go over various data mining techniques. Until then, good bye.

About the author:

Ramesh Dontha is Managing Partner at Digital Transformation Pro, a management consulting and training organization focusing on Big Data, Data Strategy, Data Analytics, Data Governance/Quality and related Data management practices. For more than 15 years, Ramesh has put together successful strategies and implementation plans to meet/exceed business objectives and deliver business value. His personal passion is to demystify the intricacies of data related technologies and latest technology trends and make them applicable to business strategies and objectives. Ramesh can either be reached on LinkedIn or Twitter (@rkdontha1) or via email:  rkdontha AT DigitalTransformationPro.com


Data Mining – What, Why, When

75 Big Data Terms To Make Your Dad Proud of You on Father’s Day

75 Big Data Terms To Make Your Dad Proud of You on Father’s Day

My earlier article on 25 Big Data terms you must know to impress your date’ had a pretty decent response (at least by my standards) and there were requests to add more. Look, it is fairly easy to impress your date. Depending on the gender, all you may need is a romantic dinner and my ‘25 Big Data terms’ cheat sheet. To impress your parents and especially your father though, it’s a totally different ball game. That’s why I am upping my game to add at least 50 more terms. This may not be sufficient but it’s worth a try.


So if you haven’t yet bought your Dad a gift for Father’s day, practise these 50 additional words along with my first list of 25 terms and take him to a nice place for lunch or dinner. You might have a chance to redeem yourself in his eyes.

Just to give you a quick recap, I covered the following terms in my first article. Algorithm, Analytics, Descriptive analytics, Prescriptive analytics, Predictive analytics, Batch processing, Cassandra, Cloud computing, Cluster computing, Dark Data, Data Lake, Data mining, Data Scientist, Distributed file system, ETL, Hadoop, In-memory computing, IOT, Machine learning, Mapreduce, NoSQL, R, Spark, Stream processing, Structured Vs. Unstructured Data, Now let’s get on with at least 50 more big data terms.


Apache Software Foundation (ASF) provides many of Big Data open source projects and currently there are more than 350 projects. I could be spending my whole life just explaining these projects so instead I picked few popular terms.


Apache Kafka: Kafka, named after that famous cZech writer, is used for building real-time data pipelines and streaming apps. Why is it so popular? Because it enables storing, managing, and processing of streams of data in a fault-tolerant way and supposedly ‘wicked fast’. Given that social network environment deals with streams of data, Kafka is currently very popular.

Apache Mahout: Mahout provides a library of pre-made algorithms for machine learning and data mining and also an environment to create more algorithms. In other words, an environment in heaven for machine learning geeks. Machine learning and Data mining are covered in my previous article mentioned above.

Apache Oozie: In any programming environment, you need some workflow system to schedule and run jobs in a predefined manner and with defined dependencies. Oozie provides that for Big Data jobs written in languages like pig, MapReduce, and Hive.

Apache Drill, Apache Impala, Apache Spark SQL

All these provide quick and interactive SQL like interactions with Apache Hadoop data. These are useful if you already know SQL and work with data stored in big data format (i.e. HBase or HDFS). Sorry for being little geeky here.


Apache Hive: Know SQL? Then you are in good hands with Hive. Huve facilitates reading, writing, and managing large datasets residing in distributed storage using SQL.


Apache Pig: Pig is a platform for creating query execution routines on large, distributed data sets. The scripting language used is called Pig Latin (No, I didn’t make it up, believe me). Pig is supposedly easy to understand and learn. But my question is how many of these can one learn?


Apache Sqoop: A tool for moving data from Hadoop to non-Hadoop data stores like data warehouses and relational databases.

Apache Storm: A free and open source real-time distributed computing system. It makes it easier to process unstructured data continuously with instantaneous processing, which uses Hadoop for batch processing.


Artificial Intelligence (AI) – Why is AI here? Isn’t it a separate field you might ask. All these trending technologies are so connected that it’s better for us to just keep quiet and keep learning, OK? AI is about developing intelligence machines and software in such a way that this combination of hardware and software is capable of perceiving the environment and take necessary action when required and keep learning from those actions. Sounds similar to machine learning? Join my ‘confused’ club.


Behavioral Analytics: Ever wondered how google serves the ads about products / services that you seem to need? Behavioral Analytics focuses on understanding what consumers and applications do, as well as how and why they act in certain ways. It is about making sense of our web surfing patterns, social media interactions, our ecommerce actions (shopping carts etc.) and connect these unrelated data points and attempt to predict outcomes. Case in point, I received a call from a resort vacations line right after I abandoned a shopping cart while looking for a hotel. Need I say more?


Brontobytes–  1 followed by 27 zeroes and this is the  size of the digital universe tomorrow. While we are here, let me talk about Terabyte, Petabyte, Exabyte, Zetabyte, Yottabyte, and Brontobyte. 


Business Intelligence (BI): I’ll reuse Gartner’s definition of BI as it does a pretty good job. Business intelligence (BI) is an umbrella term that includes the applications, infrastructure and tools, and best practices that enable access to and analysis of information to improve and optimize decisions and performance.


Biometrics: This is all the James Bondish technology combined with analytics to identify people by one or more of their physical traits, such as face recognition, iris recognition, fingerprint recognition, etc.

Clickstream analytics: This deals with analyzing users’ online clicks as they are surfing through the web. Ever wondered why certain Google Ads keep following you even when switched websites etc? Big brother knows what you are clicking.


Cluster Analysis is an explorative analysis that tries to identify structures within the data.  Cluster analysis is also called segmentation analysis or taxonomy analysis.  More specifically, it tries to identify homogenous groups of cases, i.e., observations, participants, respondents.  Cluster analysis is used to identify groups of cases if the grouping is not previously known.  Because it is explorative it does make any distinction between dependent and independent variables.  The different cluster analysis methods that SPSS offers can handle binary, nominal, ordinal, and scale (interval or ratio) data.


Comparative Analytics: I’ll be going little deeper into analysis in this article as big data’s holy grail is in analytics. Comparative analysis, as the name suggests, is about comparing multiple processes, data sets or other objects using statistical techniques such as pattern analysis, filtering and decision-tree analytics etc. I know it’s getting little technical but I can’t completely avoid the jargon. Comparative analysis can be used in healthcare to compare large volumes of medical records, documents, images etc. for more effective and hopefully accurate medical diagnoses.


Connection Analytics: You must have seen these spider web like charts connecting people with topics etc to identify influencers in certain topics. Connection analytics is the one that helps to discover these interrelated connections and influences between people, products, and systems within a network or even combining data from multiple networks.


Data Analyst: Data Analyst is an extremely important and popular job as it deals with collecting, manipulating and analyzing data in addition to preparing reports. I’ll be coming up with a more exhaustive article on data analysts.


Data Cleansing: This is somewhat self-explanatory and it deals with detecting and correcting or removing inaccurate data or records from a database. Remember ‘dirty data’? Well, using a combination of manual and automated tools and algorithms, data analysts can correct and enrich data to improve its quality. Remember, dirty data leads to wrong analysis and bad decisions.


DaaS: You have SaaS, Paas and now DaaS which stands for Data As A Service. DaaS providers can help get high quality data quickly by by giving on-demand access to cloud hosted data to customers.


Data virtualization – It is an approach to data management that allows an application to retrieve and manipulate data without requiring technical details of where it stored and how it is formatted etc. For example, this is the approach used by social networks to store our photos on their networks.


Dirty Data: Now that Big Data has become sexy, people just start adding adjectives to Data to come up with new terms like dark data, dirty data, small data, and now smart data. Come on guys, give me a break, Dirty data is data that is not clean or in other words inaccurate, duplicated and inconsistent data. Obviously, you don’t want to be associated with dirty data.Fix it fast.

Fuzzy logic: How often are we certain about anything like 100% right? Very rare. Our brains aggregate data into partial truths which are again abstracted into some kind of thresholds that will dictate our reactions. Fuzzy logic is a kind of computing meant to mimic human brains by working off of partial truths as opposed to absolute truths like ‘0’ and ‘1’ like rest of boolean algebra. Heavily used in natural language processing, fuzzy logic has made its way into other data related disciplines as well.

Gamification: In a typical game, you have elements like scoring points, competing with others, and certain play rules etc. Gamification in big data is using those concepts to collecting data or analyzing data or generally motivating users.


Graph Databases: Graph databases use concepts such as nodes and edges representing people/businesses and their interrelationships to mine data from social media. Ever wondered how Amazon tells you what other products people bought when you are trying to buy a product? Yup, Graph database!

Hadoop User Experience (Hue): Hue is an open-source interface which makes it easier to use Apache Hadoop. It is a web-based application and has a file browser for HDFS, a job designer for MapReduce, an Oozie Application for making coordinators and workflows, a Shell, an Impala and Hive UI, and a group of Hadoop APIs.

HANA: High-performance Analytical Application – a software/hardware in-memory platform from SAP, designed for high volume data transactions and analytics.

HBase: A distributed, column-oriented database. It uses HDFS for its underlying storage, and supports both batch-style computations using MapReduce and transactional interactive

Load balancing – Distributing workload across multiple computers or servers in order to achieve optimal results and utilization of the system


Metadata: “Metadata is data that describes other data. Metadata summarizes basic information about data, which can make finding and working with particular instances of data easier. For example, author, date created and date modified and file size are very basic document metadata. In addition to document files, metadata is used for images, videos, spreadsheets and web pages.” Source: TechTarget


MongoDB: MongoDB is a cross-platform, open-source database that uses a document-oriented data model, rather than a traditional table-based relational database structure. This type of database structure is designed to make the integration of structured and unstructured data in certain types of applications easier and faster.


Mashup: Fortunately, this term has similar definition of how we understand mashup in our daily lives. Essentially, mashup is a method of merging different datasets into a single application (Examples: Combining real estate listings with demographic data or geographic data). It’s really cool for visualization.

Multi-Dimensional Databases – A database optimized for data online analytical processing (OLAP) applications and for data warehousing.Just in case you are wondering about data warehouses, it is nothing but a central repository of data multiple data sources.

MultiValue Databases– They are a type of NoSQL and multidimensional databases that understand 3 dimensional data directly. They are good for manipulating HTML and XML strings directly for example.


Natural Language Processing: Software algorithms designed to allow computers to more accurately understand everyday human speech, allowing us to interact more naturally and efficiently with them.

Neural Network: As per http://neuralnetworksanddeeplearning.com/, Neural networks is a beautiful biologically-inspired programming paradigm which enables a computer to learn from observational data. It’s been a long time since someone called a programming paradigm ‘beautiful. In essence, artificial neural networks are models inspired by the real-life biology of the brain.. Closely related to this neural networks is the term Deep Learning.  Deep learning, a powerful set of techniques for learning in neural networks.

Pattern Recognition: Pattern recognition occurs when an algorithm locates recurrences or regularities within large data sets or across disparate data sets. It is closely linked and even considered synonymous with machine learning and data mining. This visibility can help researchers discover insights or reach conclusions that would otherwise be obscured.


RFID– Radio Frequency Identification; a type of sensor using wireless non-contact radio-frequency electromagnetic fields to transfer data. With Internet Of Things revolution, RFID tags can be embedded into every possible ‘thing’ to generate monumental amount of data that needs to be analyzed. Welcome to the data world 🙂


SaaS: Software-as-a-Service enables vendors to host an application and make it available via the internet. SaaS providers provide services over the cloud.

Semi-structured data: Semi-structured data refers to data that is not captured or formatted in conventional ways, such as those associated with a traditional database fields or common data models. It is also not raw or totally unstructured and may contain some data tables, tags or other structural elements. Graphs and tables, XML documents and email are examples of semi-structured data, which is very prevalent across the World Wide Web and is often found in object-oriented databases.


Sentiment Analysis: Sentiment analysis involves the capture and tracking of opinions, emotions or feelings expressed by consumers in various types of interactions or documents, including social media, calls to customer service representatives, surveys and the like. Text analytics and natural language processing are typical activities within a process of sentiment analysis. The goal is to determine or assess the sentiments or attitudes expressed toward a company, product, service, person or event.


Spatial analysis– refers to analysing spatial data such geographic data or topological data to identify and understand patterns and regularities within data distributed in geographic space.


Stream processing: Stream processing is designed to act on real-time and streaming data with “continuous” queries. With data that is constantly streaming from social networks, there is a definite need for stream processing and also streaming analytics to continuously calculate mathematical or statistical analytics on the fly within these streams to handle high volume in real time.

Smart Data: Smart data is supposedly the data that is useful and actionable after some filtering done by algorithms.


Terabyte: A relatively large unit of digital data, one Terabyte (TB) equals 1,000 Gigabytes. It has been estimated that 10 Terabytes could hold the entire printed collection of the U.S. Library of Congress, while a single TB could hold 1,000 copies of the Encyclopedia Brittanica.  

Visualization – with the right visualizations, raw data can be put to use. Visualizations of course do not mean ordinary graphs or pie-charts. They mean complex graphs that can include many variables of data while still remaining understandable and readable


Yottabytes– approximately 1000 Zettabytes, or 250 trillion DVD’s. The entire digital universe today is 1 Yottabyte and this will double every 18 months.

Zettabytes – approximately 1000 Exabytes or 1 billion terabytes.  

To read original article, click Here


Hope this list was helpful. Please feel free to ‘Like’, ‘Share’ this list with your network, and ‘Follow’ me or ‘Connect’ for my future articles.


75 Big Data Terms To Make Your Dad Proud of You on Father’s Day