Introduction
With the contemporary cloud, mobile, and streaming computing, wireless
automation, Web technologies in the competitive
data-driven market and Internet-based economy, data in explosion becomes ample
and ubiquitous across both public and private sectors (Gartner, 2013;
Information Builder, 2013). Data in health care such as PHI (Personal
health information), PII (personally identifiable information), EHR (Electronic
Health Records), or EMR (Electronic Medical Records) generated exponentially in
a healthcare setting is never at rest. While patients see clinicians or
physicians for diagnostic tests, data or information about these visits and
procedures flows among healthcare providers, healthcare systems, health
insurers, and healthcare networks. Other data includes enrollment information,
physician credentialing, appointments, fee payment schedules, medical images,
and care management documentation. The health care data in motion drives the
need for taking analytics to improve healthcare services, reduce costs, enhance
R & D in medicine, predict the probability of the diseases’ occurrence, and
enable actionable decision-making (Information Builder, 2013; Schneiderman,
Plaisant, & Hesse, 2013).
This individual
project in Unit 5 (U5 IP) provides a design proposal concerning an analytics
Hadoop solution that is applied to the business problem of analyzing various
large data sets of PHI, PII, HER, or EMR in a hospital network system. The
design proposal of the Cyber-Healthcare System gives corporate management a
descriptive information and guidance of the architecture in the project that
solves business problems of analyzing huge sets of scattered complex data in
the healthcare industry in the North-East region, e.g., New Hampshire (NH),
Massachusetts (MA), Rhode Islands (RI), and Connecticut (CT). The document also
provides the related readers a generic informative overview of the proposal for big data analytics in
healthcare for retrieval of insights that covers four following sections:
I.
Network Architecture
II. NoSQL Database - Cassandra
III. Business Plan
IV. Security Policy Proposal
Each section will describe detailed
information in data science for big data analytics in health care.
I. Network
Architecture
The design proposal of the
Cyber-Healthcare System (The System)
provides an overall description of the solution in data analytics to solve the
challenges and business problems in the healthcare field.
A. Hadoop Ecosystem
The
Cyber-Healthcare System will implement and deploy the Hadoop ecosystem to
hospitals in the area. As the de facto standard to manage big data, Apache
Hadoop - an open source Java-based framework that uses parallel data processing
across distributed clusters - is chosen for this project (Apache Software
Foundation, 2014). A simplified Hadoop architecture includes four major
components:
1. Hadoop Common
The component
consists of Java libraries and utilities to support other components.
2. Hadoop YARN
The component
does job schedules and manages cluster resources.
3. HDFS (Hadoop Distributed File System)
The HDFS
provides high-throughput access to application data.
4. Hadoop Map/Reduce
It performs Map
and Reduce functions on large data sets in parallel processing to retrieve
insightful health information for patients, clinics, and hospitals.
Figure 1 shows
a simplified Hadoop framework with four components: YARN Frameworks, Common
Utilities, HDFS, and Map/Reduce Computation.
Source: Adapted from
Hadoop Software Foundation, 2012.
1.
Figure
2 displays a high-level HDFS architecture with name node and multiple data
nodes in data processing.
Source: Adapted from Borthakur (Apache Hadoop
Organization, 2012).
Healthcare tools in the Cyber-Healthcare
System are designed to assist health authorities in long-term plans, business
strategies, and healthcare policies. The healthcare tools include diagnostic
tools for monitoring, evaluating, and assessing data. Other tools are used to
support priority scheduling, identify effective strategies, evaluate the cost,
plan resource, calculate budget, and program and implement tasks.
B. External interfaces
The
Cyber-Healthcare System will allow related users such as frontline healthcare
providers, nurses, physicians, and administrator to enter data or view and
search health information in the Hadoop ecosystem. Patients can access and view
their health records only. However, administrators and designers have full privileges
and authorization in options such as read/write/delete/save or change data,
files.
1. User interface
There
is one unique graphic user interface (GUI) for four types of users.
- The GUI with basic privilege
is provided to patients who can read, view, print out individual health
records, information. They can schedule appointments, send emails for
questions, etc.
- Nurses, physicians or data
entry workers are provided more privileges such as to read, view health
information, enter data, search or query for useful information, etc. on the
GUI.
- Administrators and designers
have full privileges such as read, write, delete, change, query, extract data,
etc. on the GUI.
- Medical researchers have
high-level privileges to access data and information in the System for their
work.
2. Hardware interface and software
interface
The Cyber-Healthcare System with a
backbone of Hadoop environment supports NoSQL databases, aggregate data models,
and key-value databases to perform the map-reduce computing and store the
results of the mappers and the reducers in the materialized views with high fault-tolerance. Users can use industry
standard formats like XML, JSON, texts on complex data – semi-unstructured
data.
The
hardware interface comprises personal computers, desktops, laptops,
Smartphones, iPhones, iPads, etc. (Natarajan, 2012).
Software
interface includes:
a. Platforms:
- OS Windows 7, 8, 8.1, 10
(32-bit, 64-bit)
Windows
Server 2008 (64-bit)
Windows
Server 2012 (64-bit)
Windows Server 2012 R2 (64-bit)
Windows
Vista SP1 and later (32-bit and 64-bit)
- Mac OS X hosts (64-bit)
Mavericks: 10.9
Yosemite: 10.10
EI
Capitan: 10.11
- Linux hosts (32-bit or
64-bit)
Ubuntu
10.04 to 16.04
Debian
GNU/Linux 6.0 (“Squeeze”) and 8.0 (“Jessie”)
Oracle
Enterprise Linux 5, Oracle Linux 6 and 7
Redhat
Enterprise Linux 5, 6 and 7
Fedora Core
/ Fedora 6 to 24
Gentoo
Linux
openSUSE 11.4 to 13.2
- Solaris hosts (64-bit
only)
Solaris
11
Solaris 10 (U10 and
higher)
b. Emulated hardware
- Input devices: Standard
PS/2 keyboards and mouse
- Graphics: Standard VGA
devices
- Storage: Intel PIIX3/PIIX4
chips, the SATA (AHCI) interface, and two SCSI adapters (LSI Logic and
BusLogic)
- Networking: Linux
kernels version 2.6.25 or later
Windows 2000, XP and Vista, drivers
- USB: xHCI, EHCI, and
OHCI
C. Data flow diagrams
Healthcare
data is processed dually in traditional databases in data warehouse and advanced
Hadoop subsystem in ETL (Extract, Transform, and Load) process in parallelism
as shown in Figures 3 and 4 below:
Figure 3: Process flow of the large
healthcare data in Map/Reduce functions in Hadoop subsystem.
Source: Adapted from
Intel, 2016.
Figure 4: Data flow in both
traditional data warehouse and Hadoop subsystem in parallelism in the
Cyber-Healthcare System.
Source: Adapted from
Intel, 2016.
In the Cyber-Healthcare System, a
XML data flow document can be written in XML format. For example, a typical XML
design document sample is programmed as follows:
<?xml verson=”1.0”?>
<!—File name:
TheCyberHealthcareSystem.xml -->
<Group>
<Groupname>Arizona</Groupnames>
<Hospital>XXX</Hospital>
<DeptInternalMedicine>AAA</DeptInternalMedicine>
….
<DeptIntensiveCare>BBB</DeptIntensiveCare>
…..
<DeptFamilyCare>BBB</DeptFamilyCare>
……..
….
<Clinic>YYY</Clinic>
…..
<Nursinghome>ZZZ</
Nursinghome>
……
<Groupname>Colorado</Groupnames>
<Hospital>III</Hospital>
<DeptInternalMedicine>OOO</DeptInternalMedicine>
….
<DeptIntensiveCare>PPP</DeptIntensiveCare>
…..
<DeptFamilyCare>QQQ</DeptFamilyCare>
……..
….
<Clinic>JJJ</Clinic>
…..
<Nursinghome>KKK</
Nursinghome>
…..
</Group>
<Patient>
<Patientname>StevenConte</Patientname>
<PatientID>7742661926</PatientID>
<PatientDOB>04301975</PatientDOB>
<PatientAddress>XYZ</PatientAddress>
<PatientOccupation>zyx</
PatientOccupation>
<PatientAge>46</PatientAge>
<PatientHeight>5ft8Inch</PatientHeight>
<PatientWeight>150</PatientWeight>
<PatientHeight>5ft8Inch</PatientHeight>
<PatientIllness>Vertigo</
PatientIllness>
…………
</Patient>
………..
D. Overall system diagrams
The
overall Cyber-Healthcare System based upon a Hadoop ecosystem consists of a
Hadoop YARN, Common Utilities Unit, HDFS, Hadoop Map/Reduce, and NoSQL Database
Cassandra. Hadoop YARN is a communication and control unit that provides job
scheduling and cluster resource management. Common Utilities Unit is a
supportive unit to provide libraries and utilities. HDFS provides accessing to
health data sets. Hadoop MapReduce applies parallel processing on the
healthcare large data sets effectively. The large data sets in Tetra Bytes are
broken into 64 or 128 MB and stored in multiple low- cost commodity nodes in
HDFS for Map and Reduce functions to retrieve insightful information for
end-users such as patients, frontline care providers (e.g., nurses, physicians,
healthcare technologists, etc.). The column-oriented database Cassandra stores
healthcare data fed to the HDFS for data processing.
Figure 5 shows the overall central
Hadoop System with the control unit Hadoop YARN, supportive unit Common
Utilities, and Cassandra in the Cyber-Healthcare System.
Source: Created by ThienSiLe (TSL), December 2016
E. Communication flow chart
In communication, the Cyber-Healthcare System consists of
the central Hadoop ecosystem that connects to four groups, i.e., New Hampshire,
Massachusetts, Rhode Islands, and Connecticut in star configuration as shown in
Figure 6 below. Each group is linked to local hospitals, outpatient clinics,
nursing homes, and rehabilitation centers. Each organization has many
departments. Each department has its own care team or frontline care providers
that include physicians, nurses, and family members who provide health care
services to patients. Also, the environment group that comprises regulators,
Medicare, Medicaid, insurance companies, healthcare purchasers, and research
funders can communicate with institutions such as hospitals, clinics, nursing
homes, rehabilitation centers, etc.
Figure 6 depicts a high-level communication flow chart
among agencies in the Cyber-Healthcare System.
Source: Created by TSL, December 2016
II. Database
The Cyber-Healthcare System can
take advantage of using Hadoop HDFS as an inexpensive option for storing big
data of healthcare. However, because of the colossal data sets including
patient medical activities, clinical health information like electronic health
records, public health data in the North-East region, the column-oriented
database Cassandra is more suitable for storage database in the Hadoop Ecosystem.
Notice that Hadoop HDFS is still a distributed storage for data processing in
the system.
A. Health data
Health
data is a part of big data - the broad and complex data sets - that include
some structured and unstructured data. Healthcare data can be individual health records, health information, patient-related
blogs, mobile data, PDF files, web log data, forums, website content,
spreadsheets, photos, clickstream data, RSS feeds, word processing docs,
medical scanning images, videos, audio files, RFID tags, social media
data, XML patient data, call center transcripts, etc. The sources of the
healthcare data include electronic patient record system, the Internet, search
engines, online social media networks like Facebook, Twitter, medical
information exchanged posted by millions of people, e.g., patients, healthcare
providers, disease researchers, etc. in hospitals, clinics, nursing homes,
rehabilitation centers across the globe. Data sets can be replicated and shared
among nodes in the scalable distributed clusters of the Cassandra database
system.
Insightful information that is extracted from healthcare
data can be categorized into three categories (Schneiderman, Plaisant, &
Hesse, 2013):
1. Personal health information:
Physicians and patients collect information about their
practice and own health habits.
2. Clinical health information
Electronic health records systems can enhance a health
care or cure to patients and useful insights into pragmatic patterns of
treatment.
3. Public health information
A large quantity of public health data is collected to
assist policy makers in more reliable decisions.
B. Database Framework
Cassandra is a non-relational
data storage system that does not require a relation schema, joins concept with
some level of tolerance to ACID (Atomicity,
Consistency, Isolation, and Durability) properties (Sadalage, &
Fowler, 2012). Cassandra can handle big data, particularly healthcare data, in
massive volume, various forms, and fast processing for the useful insights.
Cassandra that is a contemporary column-oriented database is a NoSQL big-table-style data model of
the two-level aggregate structure. The first level value is a row identifier to
form a map of more detailed values. The second level values are columns that
contain column keys and data values. An aggregate is a collection of data that
user can interact as a unit. Data in the column-oriented database is structured
in the row-oriented and column oriented way that allows users to treat the
aggregate as units of data within a row aggregate.
Based on personal health
information such as patient medical activities, clinical health information
like electronic health records, public health data stored in column-oriented
database Cassandra, healthcare data and data analytics holds the promise to
improve the quality of healthcare delivery, and contains the potential to
enhance patient care, save lives, and lower the treatment cost. Big data
analytics (BDA) on data in healthcare in Cassandra plays a crucial role in the
improvement of the quality of healthcare, patient treatment, and disease
prevention. The promise of data
analytics and big data in healthcare includes supporting a wide range of
medical and healthcare activities in physicians’ offices, hospital networks,
outpatient clinics, etc. The potential consists of improving patient care,
saving lives at lower cost in many advantages of using big data to healthcare in
clinical operations, research and development, public health, evidence-based
medicine, genomic analytics, pre-adjudication fraud analysis, patient profile
analytics, etc.
The typical findings in healthcare data include the promise and potential
of the BDA in healthcare in Cassandra as shown below:
a. The promise of supporting a wide
range of medical and healthcare functions.
b. The potential to improve the
quality of healthcare delivery at lower cost (Raghupathi, & Raghupathi, 2014).
Cassandra provides a lot of
advantages to the System:
- Since it uses an open-source technique,
it is cheap and easy to implement.
- Data is replicated to multiple nodes and
can be partitioned.
- It is easy to distribute in the network,
- It does not require a schema.
- It can scale up and down.
- The data consistency requirement (CAP)
is relaxed.
C. Cassandra
is a good fit
The column-oriented database Cassandra
is a good fit for the Cyber-Healthcare System (Sadalage, & Fowler, 2012;
Upadrasta, & Chungath, 2014):
(1) Event logging
Cassandra with its ability to store
any data structures is a good choice to store patient records, patient medical
activities, doctors’ visits, etc. All events can be written in columns and row
key of the form patient ID. Since it can be scaled the writes, Cassandra
becomes ideally for recording patients’ activities or such events.
(2) Content management
Column families in Cassandra
consist of many columns for data entries with tags, categories, links,
trackbacks, etc. in different columns. Comments can be either stored in the
same row or moved to different key spaces. Illness or diseases can be put into
different column families (Confino, 2010).
(3) Counters
In the analyses of healthcare data,
healthcare providers usually need to count and categorize patients for
calculating analytics or statistics. Cassandra provides CounterColumnType
command during the creation of a column family.
(4) Expiring usage
Frontline healthcare providers such
as physicians, nurses, technologists, etc. may want to show patient records,
prediction of treatment, prevention cure, etc. They can do that by using
expiring columns. Cassandra allows displaying information in certain given time
then the information is deleted automatically. The time is known as TTL (Time
To Live) and is defined in seconds. The column will be deleted after the TTL
has elapsed.
(5) When Cassandra is not to use
Cassandra should not be used for
Writes and Reads that require ACID transactions due to possible consistency
failure. Cassandra cannot handle early prototypes or initial tech spikes
because early stage may require columns change. The cost may be higher in
Cassandra for query change as compared to schema change in traditional
relational databases.
III. Business Plan
The dynamic and energetic world has
constantly changed and intertwined rapidly with full uncertainty and chaos. It
is almost impossible to predict the different future from the known present. In
an aggressively competitive healthcare business environment, many organizations
realize that innovation of the existing systems with the interaction between
humans and technology in the Cyber-Healthcare System is important in healthcare
services and medicine research and development (R & D). The business plan
of the Cyber-Healthcare System that uses big data analytics in healthcare is
provided as follows (VA/VHA Guidebook, 2011; Bplans, 2016):
1. Executive summary
In the
efforts to take analytics to improve healthcare services, reduce costs, enhance
Medicine R & D, the proposal of Cyber-Healthcare System is provided to
hospitals, outpatient clinics, nursing homes, and rehabilitation centers in
four states (New Hampshire, Massachusetts, Rhode Islands, and Connecticut) for
family medicine practice, teaching and research. It focuses on diagnosis and
treatment to the patients at all ages. The system also emphasizes preventive
medicine, wellness, and health of the patients.
2. Description
Proposed
establishment of the Cyber-Healthcare System utilizes big data analytics in
health care in the North-East region to give corporate management a descriptive
information and guidance of the architecture and framework. The project solves
business problems of analyzing huge sets of scattered complex data,
particularly healthcare services in the healthcare industry.
3. Objectives
a. Creating
a new health care system that consolidates the obsolete and wasteful systems in
North-East region.
b. Providing
high-quality healthcare to residents in the area.
c.
Innovating a medical practice that will exceed patients’ needs and
expectations.
d. Increase
the number of patients by 20% each year with better services and references.
e. Providing
friendly GUIs (Graphic User Interfaces) for patients, frontline healthcare
providers, and professional researchers to access information and data (e.g.,
PHI, PII, EHR, or EMR) appropriately from the designated websites.
f. Reducing
doctors’ appointments from 40% down to 5%.
g. Increase the
number of average visits by 20% in each state.
4. Background
Research
data shows that most of the well-informed, health-conscious healthcare
organizations can reduce the costs, decrease absenteeism, and increase
productivity. In the growing use of big data generation and analysis, a giant
IBM is a leader in application and integration of BDA in many fields,
particularly in healthcare. IBM Watson employs BDA in data generation with the
cognitive computing platform. It can connect massive dynamic and complex text
data among patient records and medical literature to create hypotheses among
hundred variables for treatment recommendations. Watson uses big data analytics
to process the data for specific patients whose health records, historical
illness, and physicians’ studies are examined carefully. The solution is
deployed at many academic medical clinics and hospitals such as Cleveland
Clinic, Mayo Clinic. Some institutions are partners with IBM Watson in an
ecosystem to provide effective treatments at lower cost (Power, 2015). Another
example is USAA, the financial services provider that uses IBM Watson to assist
about 15,000 members to adjust their life from military to civilian yearly.
USAA and IBM Watson use a software tool, Enhanced Virtual Assistant, or EVA, to
execute transactions such as transferring money, paying bills, communications
from BDA’s data generation and analysis in their 140 products.
5. Organizational Assessment
The current
hospitals have supported patient health promotion, but the programs have failed
to improve the health outcomes and revenue. The recent survey of about 1000
patients and EHRs indicates that patients’ wellness and health are failing
(VA/VHA Guidebook, 2011).
a. 56% of
the patients have BMI (body mass index) over 25; 23% of patients are obsessed
(BMI > 30).
b. 25% of
patients report that they use of tobacco.
c. 67% of
patients report that they are under stress in the workplace.
d. 95% of
patients report that they want a wellness program.
e. The average
claim of $442 is reported on back injuries.
Notice that
with the current national health trends, the percentage of patients’ overweight
and obesity will continue to go up with more compensation claims.
6. Proposed Services
Healthcare
organizations that participate in the Cyber-Healthcare System will focus on a
comprehensive patient health promotion program to prevent the cost rise,
enhance the wellness and health of the patients:
- Detecting the diseases at
earlier stages.
- Managing individual and
population health efficiently.
- Detecting healthcare fraud
more quickly.
- Estimating a large amount of
historical data such as length of stay, choosing elective surgery, no benefit
from surgery, etc.
- Patients at risk for medical
complication.
- Patients at risk for
advancement in disease stages.
- Causal factors of illness
progression.
- Pinpointing patients who are
the greatest consumers of health resources.
- Providing patients with the
information for making informed decisions.
- Managing patients’ own health.
- Tracking healthier behaviors.
- Identifying treatment.
- Reducing re-admissions by
lifestyle factors that increase a risk of the adverse event.
- Improving outcomes by
examining vitals from at-home health monitors.
- Managing population health by
detecting vulnerabilities within the patient population during disease outbreaks.
7. Target Market Analysis
With the
up-to-date number of patients in four states (NH, MA, RI, and CT), 57 % are
female. The median age is 50 with the age distribution: (1) 400,000 patients
between 20 and 29, (2) 800,000 patients between 30 and 39, (3) 800,000 patients
between 40 and 49, (4) 1,600,000 patients between 50 and 59, and (5) 600,000
patients at 60+ (VA/VHA Guidebook, 2011).
The Patient
Health Promotion Program (PHPP) from the Cyber-Healthcare System is open to all
patients in four states on a voluntary basis. The annual assessment will be
performed on each population group for specific services with unique health
needs.
8. Marketing Plan
New Patient
Orientation, State Human Resources, and Occupational State Health will conduct
awareness classes to target patients, distribute information materials such as
booklets, bulletins, etc. to all of them through email broadcasts, signage,
newspapers, and fliers. A Website with an events calendar is available to the
patients.
9. Resources
Resources
for implementing the Cyber-Healthcare System including hardware, software,
computers, workstations, facilities, administrators, and technical support
groups are operated and controlled by the State Public Health Offices and Board
of Trustees. Additional resources are available in annual assessments. In-kind
resources such as Reproduction, Information Resources, Medical Media to support
the PHPP program are anticipated.
IV. Security Policy Proposal
The Cyber-Healthcare System complies
with all regulations, policies, and governance for the medical healthcare
industry in its design as follows:
A. Regulations
In a practical view, market research and ethics in
healthcare data based on Internet technology are usually at odds with each
other. Big Data Analytics (BDA) presents both technical and strategic
capabilities to generate value from the data they store for the organizations.
With the blossom of BI (Business Intelligence) and BDA, there will be more security
violation and privacy issues (Quora, 2014). There is a prominent risk of
violation of the personal privacy. For example, terrorists likely hack
healthcare systems such as the Cyber-Healthcare System to sabotage the system,
harm people and take advantages for their own ideology, politics, or religion.
Big Data’s vital role in advancing medical purpose can be
hindered by HIPAA (the US Health Insurance Portability and Accountability Act),
Security, Privacy, and Breach Notification Rules that regulate medical
information. These laws, rules, guidelines have restricted and governed the
disclosure, security, collection, maintenance, transmission of electronic PHI
or PHI used by healthcare providers, health insurances, or medical R & D
groups. The PHI may include social security number, driver’s license number,
account number, photographs, credit or debit card number, required security
code, access code, password, medical information, health insurance information,
username, security questions, etc. HIPAA also requires a covered entity to
provide notice of privacy practices on the subjects. US Government has
enforcement authority. Notice that HIPAA does not apply to health information
that is not personally identifiable such as aggregate data in NoSQL databases.
It also does not apply to health information used individuals or entities that
are not covered in the definitions of covered entities in the HIPAA Act
(Practical Law, 2016). Notice that many countries have their own healthcare
laws but HIPAA appears to be one of the best in healthcare data governance in
the US.
The System considers the issues (data privacy, security
breaches) seriously and uses the latest antivirus software, firewall, etc. to
protect the integrity of data and safeguard patients’ information. The System
comply the government’s controversial in-depth regulations and obeys all
medical rules. Notice that the Cyber-Healthcare System will work to obtain International Organization for Standardization ISO
9001 Certification in the healthcare industry (Nolan, 2015).
B. Privacy Policies
Information about users’ usages of
the website is collected by using a tracking cookie, and server access logs.
The collected information includes the following:
a. The IP address from which user accesses
the website.
b. The type of operating system (OS)
and browser user uses to access the System site.
c. The date and time user accesses the
Cyber-Healthcare System site.
d. The html pages users visit.
e. The pages address from
where user followed a link to the System site.
Some of the information is gathered
by using a tracking cookie set by the Hadoop Analytics or Google Analytics
service in the privacy policy. Users may refer the browser documentation for
instructions on how to disable the cookie if they do not want to share the data
with Hadoop or Google.
The Cyber-Healthcare System respects the patients’ rights. Patients
have the following typical rights:
-
The right to receive notice of privacy practices from healthcare providers.
-
The right to see their protected health information and receive a copy.
-
The right to request changes to their records to correct errors or add
information.
-
The right to have a list of PHI.
-
The right to request confidential communication.
-
The right to complain.
The Cyber-Healthcare System gathers
information to make the website more useful and friendly to visitors and better
understanding how and when the website is surfed. The Cyber-Healthcare System
does not collect or track personally identifiable information, or associate
gathered data with any personally identifying information from the other
sources.
By using this website, user consents
to the collection of this data in the manner and for the purpose to solve the
challenges and business problems in the healthcare field (Apache Software
Foundation, 2014; Natarajan, 2012).
C. Governance
HIPAA is the federal Health
Insurance Portability and Accountability Act of 1996 in Tennessee. It was
designed to safeguard healthcare information, assist people to retain health
insurance, and facilitate administrative costs’ control in the healthcare industry
(HIPAA, 1996). On the privacy issue, HIPAA emphasizes on protection and
maintenance of personal health information in all health-related organizations.
HIPAA requires (1) frontline providers (e.g., physicians, nurses, etc.), (2)
medical producers (e.g., pharmaceutical, medical device companies, etc.), and
(3) payers (e.g., insurance companies) must comply all the law and rules in
governance.
The Cyber-Healthcare System comply
all HIPAA governance rules.
D.
Assumptions and limitations
The Cyber-Healthcare System is
developed and designed based on the following assumptions and limitations:
1. Assumptions (Flower, 1999):
- The System’s clients are
patients.
- The System’s contact with
patients is high intensity, low touch.
- Doctors are independent
carriers of information and judgment.
- Healthcare is event-driven.
- Much of ill health will be
predictable and preventable.
- Patients will be partners in
managing their health.
- Data in the System’s
centralized repository is assumed clean, reliable, and credible.
- All institutions such as
hospitals, clinics, nursing homes, etc. use the same platform to access, view,
query, and enter the large data sets in the centralized repository.
- All frontline care providers
in the care team are trained to use the System properly and professionally.
- The System keeps all sources
of time visible to the guest synchronized to a single time source, the
monotonic host time.
2. Limitations (Hortonworks, 2016)
- Some experimental features are
beta (labeled as experimental). Such beta features are provided but are not
formally supported. However, users’ suggestions and feedback are welcome.
- Poor performance with 32-bit
AMD CPUs may affect Windows and Solaris platforms.
- Poor performance with 3-bit
Intel CPU model affects mainly on Windows, Solaris, and Linux kernel.
- NX (no excuse, data execution
prevention) only works for 64-bit OS computers
- Windows XP has slower
transmission rates because it supports segmentation offloading.
- Shared folders are not
supported on the OS/2 computers.
E.
Justification
The Cyber-Healthcare System is a
modern state-of-the-art system in the contemporary network of hospitals,
outpatient clinics, nursing homes, and rehabilitation centers in the North-East
region. The System is developed to eliminate isolation among hospitals, reduce
inefficiency in care management, and prevent a loss of opportunities for
advancing patient treatments. The Cyber-Healthcare System is designed with the
following justifications:
1. Centralizing the scattered sources
of colossal data sets from many agencies, various hospitals, and clinics.
2. Transforming unreliable huge data
sets with duplication and redundancy in data and information to credible and
reliable data sets.
3. Establishing a large healthcare
network system in the region to allow users such as patients and frontline care
providers (physicians, nurses, family members) with the different privilege to
access, view, search information that is needed or required for patient
treatments, and cures at low cost possible.
4. The Cyber-Healthcare System is
implemented in Hadoop environment as described in Section I. Network
Architecture above.
5. The System’s architecture is
developed based on four target elements:
a. Patients.
b. Care team consists of
physicians, nurses, family members.
c. Organization includes
infrastructures, resources such as hospitals, clinics, nursing homes and
rehabilitation centers.
d. The environment comprises
regulation, policy, and market like regulators, Medicare, Medicaid, insurance
companies, healthcare purchaser, research funders, etc.
6. The System is designed to tackle
the huge data sets’ challenges in the healthcare industry. Some healthcare data
challenges are:
a. Capturing data is difficult.
b. Curation is not easy.
c. Storage requires huge memory,
disks.
d. Sharing data is
complicated.
e. Transfer data take a lot of
time because of huge size.
f. Analysis of data requires
advanced analytical tools.
g. The presentation is
sophisticated.
7. Organizations in the System can
provide better and high-quality services
based on historical data from previous medical records of patients.
8. The System has data visualization
feature for users to access (Schneiderman,
Plaisant, & Hesse, 2013):
-
Personal health information
-
Clinical health information
-
Public health information
Conclusion
In summary,
this U5 IP document provided a design proposal of the Cyber-Healthcare System
in the North-East region that used big data analytics in health care to cover
four sections: (1) network architecture, (2) database Cassandra, (3) business
plan, and (4) security policy proposal as follows:
I. Network Architecture:
The
framework of using big data analytics on complex healthcare data was an Hadoop
ecosystem with primary components, e.g., Hadoop Common, Hadoop YARN, Hadoop
HDFS, and Hadoop Map/Reduce, and the database Cassandra. The external
interfaces, data flow diagrams, communication flow chart, and overall system
diagram were described in details.
II. NoSQL Database
Cassandra played a critical
foundation of storage of big healthcare data that included PHI, PII, EHR, HER.
It is a good fit in the Cyber-Healthcare System where the Hadoop Distributed
File Subsystem is in the loop of data processing.
III. Business Plan
The business plan addressed the
executive summary, description, objectives, background, organizational
assessment, target market analysis, market plan, and resources of the
Cyber-Healthcare System.
IV. Security Policy Proposal
The
proposal of security policy covered practice regulations, privacy policy, data
governance, justifications, and assumptions and limitations. The
Cyber-Healthcare complies with HIPAA and expects to pass ISO 9001
Certification in the healthcare industry (Nolan, 2015).
The virtual Cyber-Healthcare System is a
proposal of Hadoop framework to apply big data analytics to huge data sets in
health care to improve healthcare services, reduce costs, enhance R
& D in medicine, predict the probability of the diseases’ occurrence, and
enable actionable decision-making. Notice that the design of four categories (network architecture,
databases, business plan, and security policy proposal) in the System exceeds
the requirement of three out of five given categories.
REFERENCES
Apache Software
Foundation (2014). What is apache hadoop?
Retrieved November 08, 2015 from http://hadoop.apache.org/
Bplans (2016). Free medical and health care sample business
plans. Retrieved
December 06, 2016 from
http://www.bplans.com/medical_and_health_care_business_plan_templates.php
Confino, J. (2010). Why you need nosql in your toolbox.
Retrieved October 17, 2016 from
http://chariotsolutions.blogspot.com/2010/01/why-you-need-nosql-in-your-toolbox.html.
Flower, J. (1999). The revolution in our assumptions about
healthcare. Retrieved
September 08, 2016 from
http://www.well.com/~bbear/assumptions.html
Gartner Group (2013). Gartner predicts business intelligence
and analytics will remain a top focus for CIOs through 2017. Press Release. Las
Vegas, NV. Retrieved June 4, 2015 from http://www.gartner.com/newsroom/id/2637615.
HIPAA Act, (1996). The federal health insurance
portability and accountability act. Retrieved October 19, 2015 from http://tn.gov/health/topic/hipaa.
Information Builder, (2013). Data in motion – big data
analytics and healthcare. Retrieved December 04, 2016 from
http://docs.media.bitpipe.com/io_10x/io_109369/item_674791/data%20in%20motion%20big%20data%20analytics.pdf
Natarajan, R. (2012). Apache Hadoop Fundamentals – HDFS and
MapReduce Explained with a Diagram. Retrieved November 01, 2015 from http://www.thegeekstuff.com/2012/01/hadoop-hdfs-mapreduce-intro/
Nolan, J. (2015). Would hospitals benefit from ISO 9001?
Retrieved September 08, 2016
from
http://advisera.com/9001academy/blog/2015/07/21/would-hospitals-benefit-from-iso-9001/
Power, B. (2015). Artificial intelligence is almost for
business. Retrieved August 14, 2016 from https://hbr.org/2015/03/artificial-intelligence-is-almost-ready-for-business
Practical Law (2016). PLC - Data protection in the united
states: overview. Retrieved November 21, 2016 from
http://us.practicallaw.com/6-502-0467
Sadalage, P. J., & Fowler, M. (2012). NoSQL distilled: a
brief guide to the emerging world of polyglot persistence. Pearson Education.
Schneiderman,
B., Plaisant, C., & Hesse, B. (2013). Improving healthcare with interactive visualization methods. Retrieved
September 06, 2016 from
https://www.cs.umd.edu/~ben/papers/Shneiderman2013Improving.pdf
Upadrasta, B., &
Chungath A. (2014). “NoSQL, NewSQL, or RDBMS: How To Choose”. Retrieved October
17, 2016 from
http://www.informationweek.com/big-data/big-data-analytics/nosql-newsql-or-rdbms-how-to-choose/a/d-id/1297861
VA/VHA Employee Health Promotion Disease Prevention
Guidebook (2011). Sample business plan – public health. Retrieved December 06, 2016 from www.publichealth.va.gov/docs/employeehealth/12-Sample-Business.pdf