| Our
e-commerce web site development
strategy offers you the following
benefits:
Our intuitive and user-friendly
control panel enables administrators
to manage the webstore and
sales - whether it's membership,
product or service purchases.
It allows administrators to:
|
» |
Based on Visitor's IP Address |
|
» |
A
persistent identifier
for that session only
|
|
» |
A
persistent identifier
that lets know the same
web browser on a particular
computer has returned
for a repeat session
|
|
» |
A
persistent identifier
that lets know the particular
human being has returned
to our web site |
|
» |
A
persistent identifier
that lets know the particular
human being has returned
to our web site |
|
|
Main Challenges
Building a successful data
warehouse itself is a challenging
task and building a data mining
model on the data poses lot
of challenges, starting from
the understanding of the business
problem, data preparation
to the building and deploying
the mining model. Web poses
specific challenges in terms
of cleaning, transforming
and loading the data for the
purpose of analysis, as normally
90% of the click-stream data
is of not much importance
from analytic perspective.
List of possible challenges:
Identification of the origin
of the visitor is required.
To get the more out of the
click stream data it is required
to characterize the web site
visitors, based on their demographics.
A web site visitor can be
identified by making use of
cookies, online forms etc.
If these options are not there,
then the customers are to
be identified only by the
IP address of the connection
from which he is accessing
the web site. The origin of
the visitor is to be identified
to have more insight on the
visitor behavior using one
of these methods.
Calculation of the Dwell time
for a content page. The time
spent by the visitor on a
particular page provides a
good measure showing the interests
of the visitor. Direct ways
are not available to calculate
the dwell time of a visitor
on a page.
Identification of a User Session.
A visitor can be characterized
by studying his browsing behavior
in a session, which is a collection
of web based transactions
related by time. Computing
the start and end of a session
is a complex process.
Managing Ecommerce Website
Structure Information: The
structure of the web site
is an important information.
With the continuous changes
in creating and maintaining
electronic documents, there
are multiple challenges in
the ETL process for loading
and maintaining the web site
structure. The challenges
include handling dynamic pages,
handling ancillary pages,
extracting page title and
category and handling frequently
changes in the pages served
in the web site.
Let us look into each of the
problems at a detailed level
and more importantly how to
address them.
Challenge 1:
Identification of the Origin
of the Visitor Web is the
most anonymous thing on the
earth and the web site visitors
want to be anonymous. It is
a great challenge to discover
the personalities of these
anonymous visitors based on
their behavior during the
time they interact with your
web site, and capturing enough
information to do so without
infringing into their privacy.
There are four levels in which
a user can be identified,
viz.
|
» |
Based on Visitor's IP Address |
|
» |
A
persistent identifier
for that session only
|
|
» |
A
persistent identifier
that lets know the same
web browser on a particular
computer has returned
for a repeat session
|
|
» |
A
persistent identifier
that lets know the particular
human being has returned
to our web site |
|
» |
A
persistent identifier
that lets know the particular
human being has returned
to our web site |
|
|
Based on the Visitor's
IP address get the
country rather than the person
name. It is better to know
atleast the country of the
visitor instead of anonymity.
Knowing the country of the
visitor provides with opportunities
to a personalize the web site
for his needs as well in gaining
the browsing behavior of the
person with respect to the
local time of the user.
The IP addresses are allocated
dynamically by the Internet
Service Providers (ISPs) to
their customers. The IP address
is not the unique way to identify
a web site visitor. There
are databases maintained for
each part of the globe which
gives the country, contact
person of the ISP, his mail-id,
phone number, fax number,
IP address allocating authority
and the route to the IP address
etc. This helps to identify
the part of the globe from
which the visitor is originating.
A persistent identifier
for that session only
can be passed through URLs,
hidden fields or session identifiers.
This will help avoid the problem
of proxy servers. But only
current session can be recorded
No way of tracking repeat
visits and the browser Caching.
Clicking of the back button
is not recorded in the web
server log. This makes it
impossible to have a complete
map of user.s actions. A possible
solution for this could be
the use of No-Cache tags in
the HTML content
A persistent identifier
that lets know the same web
browser on a particular computer
has returned for a repeat
session can be implemented
through persistent cookies
stored on the client machine.
The cookie is a record placed
on a user's PC by a web browser
in response to a request from
a web server. The cookie contents
are specified by the web server
and can only be read from
the domain that is specified
the cookie. This provides
a way to identify the machine
from which the user is accessing
the net and not the user.
The problems with cookies
are that the user might have
disabled the cookies. Even
if the cookies are enabled
the user may delete it at
any point of time.
A persistent identifier
that lets know the particular
human being has returned to
the web site is normally
implemented via access through
user/password. Online forms
like registration or preferences
for customization are an excellent
source to link customers to
clicks generated by them.
By far, it is the most effective
method of gathering visitor
information. Online forms
also have problems. It is
believed that when asked for
their name on an Internet
form, men will enter a pseudonym
50 percent of the time, and
women will use a pseudonym
80 percent of the time. It
is not preferable to ask the
user to fill in the form while
he is visiting the site for
the first time, as it can
be repulsive.
Challenge 2:
Calculation of Dwell
Time
Dwell time is the time spent
by the visitor on a content
page. It is an important measure
of the relevance of the content
for the user and effectiveness
of the page in attracting
the visitor. The dwell time
can be calculated by finding
the difference between the
2 content page requests and
subtracting the time required
to load the content page and
the ancillary files from the
value. But the time required
to load streaming media files
like real audio and mpeg may
not be considered for the
dwell time computation. In
this case, the dwell time
is to be computed using the
beginning of the streaming
media download regardless
of whether the rest of the
content if fully downloaded.
Challenge 3:
Identification of
User Session
The start and end of a user
session is to be identified
in order to analyze the user
behavior in a session as well
as for measuring the effectiveness
of the design of the web site
in keeping the visitor for
more time in the site. This
also helps in identifying
the various .entry pages.,
the page through which a visitor
enters the web site and effectively
design these pages by providing
links to other pages and putting
appropriate ad-banners in
the pages depending upon the
context. Any page in a web
site can be the entry page
for a visitor as key word
search in search engines can
lead the visitor to any page
in the web site. The identification
of a session also helps in
identifying the most popular
exit pages, which could be
the session killers. Identifying
the session killers and effectively
redesigning them may keep
the users in the web site
for more time. But there is
no direct way to identify
the start and the end of a
user session.
Challenge 4:
Managing Ecommerce
Website Structure Information
Web sites may serve static
or dynamic pages or a combination
of both and each page served
may contain or have links
different type of files like
documents, images, multimedia,
embedded scripts, etc. Pages
can be static html documents
or can just consist of a template
and an Application Server
can serve the content for
the different components of
the template. The type of
files served may change frequently.
Many new pages can be added
on a daily or weekly basis
and the old pages may be superceded.
According to its purpose,
the files may have a classification
like Company information,
Product catalogue, Technical
support, Ordering page, etc.
Content pages may have page
titles, which will be required
for analysis and it should
be extracted and loaded to
page dimension.
Dynamic pages. Pages can be
generated and served dynamically
based on the parameters given
by the visitor in a previous
page. A dynamic page can consist
of a template with different
components and the content
for each component can be
generated dynamically based
on a given set of parameters.
The page used will be the
same but the content served
will be different at different
instances of time. Storing
all the instances of the dynamic
page will drastically increase
the size of the page dimension.
|