A few notes that I made in margins of a
book ( written in 1989 ) that I read in 2002
May be by now ( in Oct
2016 ) , someone has already implemented the type of EXPERT SYSTEM that I
conceived in my notes
If not , here is a great opportunity for
some Indian Start Up !
I would be happy to guide , if requested
hemen parekh
hcp@RecruitGuru.com
25
Oct 2016
-------------------------------------------------------------------------------------
Book > Expert Systems
Author > Edited by RICHARD FORSYTH
When Read > Aug 2002
-------------------------------------------------------------------------------------
Page 6
Is this like our saying : IF such and such keywords
appear in a resume , THEN
, it may belong to such and such INDUSTRY or FUNCTION ?
Page 7
I believe " knowledge " contained
in our 65,000 resumes, is good enough to develop an Expert System ( ARDIS -
ARGIS )
We started work on ARDIS-ARGIS in 1996 !
But taken up seriously , only 3 months ago
ARDIS
= Artificial Resume Deciphering Intelligent System
ARGIS
= Artificial Resume Generating Intelligent System
Page 8
Keywords are nothing but descriptions of
resumes
Page 11
I believe ISYS manual speaks of " Context Tree " - so
does Oracle " Context
Cartridge ( Themes ) "
Page 15
In our case , " Hypothesized Outcome
" could be a resume,
- getting shortlisted by 3P / by Client,
Or ,
a Candidate getting " appointed "
( after interview )
In our case :-
The " presence of the evidence "
could be presence of certain " keywords " in a given resume ( the Horse ) or certain
" Edu Qualification
" or certain " Age
" or certain " Exp
( years ) " or certain " Current Employer " etc
Page 16
In our case, these several " pieces of
evidence " could be ,
* Keywords * Age * Exp * Edu Quali *
Current Industry Background * Current Function Background * Current Designation
Level * Current Salary * Current Employer etc
We could " establish " ODDS ( for
each piece of evidence ) and then apply " Sequentially " , to figure
out the " Probability / Odds " of that particular resume getting
shortlisted / getting selected
We have to examine ( statistically ), resumes
of all candidates shortlisted during last 13 years , to calculate " Odds
"
Page 18
" Automating " the process of
Knowledge Acquisition ?
We could do this ( automating ), by getting
/ inducing the jobseekers to select / fill in , keywords themselves online , in
the web form
Page 19
The " Decision Support " that our
consultants need is : -
" From amongst thousands of resumes in
our databank, which " few " should be sent to client ? Can software
locate those few automatically, which have " excellent probability "
of getting shortlisted / selected ? "
Our consultants , today, spend a lot of
time in doing just this , manually - which we need to automate .
Page 20
These ( few ) resumes are " GOOD
" for this " VACANCY "
Page 22
According to me , this " notation
" is :-
" All human thoughts / speech and
action, are directed towards either increasing the happiness ( of that person )
or towards decreasing the pain , by choosing from amongst available thoughts /
spoken words / actions "
This notation describes all aspects of
human race
This ability to choose the most appropriate
option ( at that point of time ) , makes a human being , " intelligent
"
Page 23
There are millions of " words "
in english language - used by authors of books and poets in songs and laywers
in documents but the words of interest to us are those used by jobseekers &
recruiters in job advts . This is our area of expertise
Data = Probabilities of 10,000 keywords
occurrence amongst " past successful " candidates
Problem Description = See remarks at the
bottom of page 19 for OUR problem description
Page 25
RESUMIX ( Resume Management Software )
claims to contain 100,000 " rules "
* Our expertise in " matchmaking
" of jobseekers and " vacancies " of recruiters
* Our business does fall in such a "
specialist " category
* Persons who have spent 15 years reading
resumes / deciding their " suitability " and interviewing candidates
Page 27
* Agree . We do not expect " Expert System
" to conduct interviews !
* Our consultants do spend 2/3 hours daily
in reading / short-listing resumes
* We want a " Decision Support System
" to assist our consultants , so
they can spend more time in " interview " type of assessment
* If, during last 13 years , we have placed
500 executives , then we / client , must have short-listed 5,000 resumes .
These are enough " Test Cases "
Page 28
Now ( in 2002 ), expert systems have become
an " Essential " to survival of all organizations . We can ignore it
at our peril
Page 29
* We can become VICTORS or VICTIMS : choice
is ours
* I am sure , by 2002 , we must have many
" MATURE " expert system " kernels / shells " commercially
available in the market ( now available for Rs 3,000 / pound 40 )
Page 30
May be we could send an email to Mr FORSYTH
himself, to seek his guidance
We will need to explicitly state > our
problem > solution which we seek from
the Expert System ,
and ask him which commercially available
" shell " does he recommend
email : Richard.Forsyth@uwe.ac.uk
Page 32
* How many does this ( CRI-1986 ) directory
list in 2002 ?
* Google still shows CRI-1986 , as the
latest ! But , " Expert Systems " in Google returned 299,000 links !
* I
took a course in X-Ray Crystallography at KU in 1958
Page 33
*
When developed , our system would fall under this category
*
Most certainly, we should integrate the two
Page 35
The resumes short-listed by our proposed
" Expert System " ( resumes having highest probability of getting
short-listed ), must be manually " examined " - and assigned "
Weightage " by our consultants & these " Weightages " fed
back into the system
Page 37
I believe, our system will be simple "
rule-based " - although , there may be a lot of " processing "
involved , in " Sequential Computation " of probabilities for
keywords related to :
Industry / Function / Designation Level /
Age / Exp / Edu Quali / Attitudes / Attributes / Skills / Knowledge / Salary /
Current Employer / Current posting location / family etc
Page 39
* Abhi / Rajeev :
In my notes on ARDIS / ARGIS , see separate
notes on " Logic for.......... "
Here, I have listed the under-lying rules
Page 40
Expert Knowledge ( - and consequently , the
rules ) contained in RESUMIX have relevance to USA jobseekers - and their
" Style " of resume preparation.
These ( rules ) , may not apply in Indian
context
Page 41
We are trying to establish the " relationship
" between :
(A)
Probability of occurrence of a given " keyword " in a given
resume,
WITH,
(B) Probability of such a resume getting
short-listed
Page 42
So , we will need to prepare a
comprehensive list of inconsistencies , with respect to a resume
Page 43
* We should ask both ( the Expert System and the Experts ) ,
to independently short-list resumes and compare
* We have to experiment with building of an
Expert System which would " test / validate " the assumptions :-
If certain ( which ? ) keywords or search
parameters are found in a resume, it has a higher probability of getting
short-listed / selected
Page 44
* Eg: System short-listing a " Sales
" executive against a " Production " vacancy !
* What / Which " Cause " could have produced, what
/ which , " Effect / Result "
Page 45
In our case , the Expert System , should
relieve our consultants to do more " Intelligent " work of assessing
candidates thru personal interviewing
" Human Use of Human Beings " by
Norbert Weiner ( read first in 1956 )
Page 47
Eg:
(1) Entering email resumes in structured
database of Module 1
(2) Reconstituting a resume ( converted
bio-data ) thru ARGIS
For these " tasks " , we should
not need human beings at all
Read , " What Will Be " by
Michael Dertouzo ( MIT Lab - 1997 )
Page 48
* Even when our own Expert System "
short-lists " the resumes ( based on perceived high probability of
appointment ) , our consultants would still need to go thru these resumes before
sending to clients. They would need to " interpret "
* Read all my notes written over last 13
years
Page 50
* Our future / new consultants , need to be
taken thru OES ( Order Execution System ) , step by step, thru the entire
process - thru simulation , ie; fake search assignment
* Our " Task Area " is quite
obvious - but may not be simple _ , viz:
We must find the " right "
candidates for our clients , in shortest possible time
* In 13 years, since this book was written
, " Mobile Computing " has made enormous strides . Also internet
arrived in a big way in 1995
By March 2004 , I envisage our consultants
, carrying their laptop or even smaller mobile computers & search our MAROL
database for suitable candidates ( of course , using Expert System ) , sitting
across clients' table
Page 58
Resumes are " data " but when
arranged as a " short-list " , they become " information "
because a " short-list " is always in relation to our " Search
Assignment "!
It is that " Search Assignment "
that lends " meaning " to a
set of resumes
Page 59
Are " resumes " , knowledge about
" people " ? - and their " achievements " ?
Page 60
... but , is a human , a part and parcel of
nature ?
Human did not create Nature but did Nature
create Human ?
Our VEDAS say that the entire UNIVERSE is
contained in an ATOM . May be they meant, that an entire UNIVERSE can arise
from an ATOM !
Page 61
* Read " Aims of Education " by A
N Whitehead
* Inference is a process of drawing /
reaching " conclusions " based on knowledge
Page 62
Calculating probabilities of occurrence of Keywords
and then comparing with keywords contained in " Resumes " , of
SUCCESSFUL candidates
Page 64-65
IF
a
resume " R " contains keywords : a : b : c :
and,
IF
resumes of all past " Successful "
candidates also contain keywords : a : b : c : ,
THEN,
resume " R " will also be "
successful " ( conclusion )
Our Expert System will be such an "
Automatic Theorem Proving System " ,
WHERE,
" Inference Rules " , will have
to be first figured out / established , from large volumes of past "
co-relations " between " keywords " and " successes "
" Successes " can be defined in a
variety of ways , including :
#
Short-listing # Interviewing ( getting interviewed ) # Appointing
( getting appointed )
Page 67 {
4.3.2 /
Backward Chaining }
In our case too, we are trying to "
interpret / diagnose " the " symptoms " ( in our case the "
Keywords " ) , contained in any given resume ( patient ? ) & then
" predict " what are its chances ( ie; probabilities ) of " success
" ( = getting cured ),
ie; getting " Short-listed " /
" Interviewed " / " Appointed "
Page 68
These resumes can be further sub-divided
according to, INDUSTRY / FUNCTION / DESIG LEVEL etc, to reduce population size
of keywords
For us , there are as many " Rules
" as there are " Keywords " in the resumes of the past "
Successful " candidates - with the " frequency of occurrence "
of each such keyword ( in say , 7,500 successful resumes ) , deciding its
" weightage " while applying the rule
Page 69
(1)
Our initial assumption >
Resumes of " Successful ( past ) candidates , always contain
keywords, a , b , c, d ..etc
(2)
Process > Find all other resumes which contain.. a, b, c, d
(3)
Conclusion > These should " succeed " too
Our goal :
Find all resumes which have high
probability of " success "
System should automatically keep adding to
the database , all the " actually successful " candidates as each
search assignment gets over
Page 70
* In 1957 , this ( travelling salesman
problem ) , was part of " Operations Research " course at Uni of Kansas
*
With huge number crunching capacities of modern computers ,
computational costs are not an important consideration
Page 72
* We are plotting frequency of occurrence
of keywords in specific past resumes to generalize our observations
* We
can construct a table like this from our 65,000 resumes & then try to
develop " algorithms "
* Abhi / Rajeev > In above table , the
last column ( Job or Actual Designation ) , can also be substituted by ,
# Industry
# Function # Edu
Qualification ,
and a new set of " algorithms "
will emerge
Page 74
Our resumes also " leave out " a
hell of a lot of " Variables " !
A resume is like a jigsaw puzzle with a lot
of missing pieces !
We are basing on " Statistical
Forecasting " , viz:
frequency of occurrence of certain keywords
and attaching ( assigning ) probability values
Page 76
Just imagine , if we too can locate and
deliver to our clients, just that candidate that he needs ! - in the first
place , just those resumes , which he is likely to value / appreciate
I am sure, by now superior languages must
have emerged.
Superior hardware certainly has , as has
" Conventional Tools " of database management
Page 77
Perhaps what was " specialized "
hardware in 1989, must have become quite common today in 2002 - and fairly
cheap too
Page 81
* We must figure out ( - and write down ) ,
what " logic / rules " our consultants use ( even subconsciously ) ,
while selecting / rejecting a resume ( as being suitable ) , for a client need
. Expert System must mimic a human expert
* We
are basing ourselves ( ie: our proposed Expert System ) on this " type
" ( see " patterns " in " Digital Biology " )
Page 82
This is what we are after ( Statistical
Analysis / Fuzzy Forecasting / Pattern Recognition )
Page 87
Probability that this keyword ( symptom )
will be observed / found , given that the resume ( patient ) belongs to XYZ
industry ( illness )
Page 88
*
Random person = any given " incoming " email resume
Influence = " Automobile " industry
Based on our past population ? = { "
Auto" resumes divided by All resumes } probability
If " symptoms " = keywords, which
" symptoms " ( keywords ) , have appeared in " Auto "
industry resumes , OR ,
which keywords have NEVER appeared in
" Auto " resumes , OR
appeared with low frequency ?
Page 89
With addition of ALL keywords ( including
new keywords , not previously recorded as " keyword " ) from each new
incoming resume , the " prior probabilities " keep changing
Page 90
In this " resume " , assuming it
belongs to a particular " industry " ,
>
what keywords can be expected
>
what keywords may not be expected
We can also reverse the " reasoning
" viz:
>
What " industry " ( or " Function " ) might a given
resume belong to , if
* certain keywords are present ,
AND
* certain keywords are
absent ?
*
The " result / answer " provided by Expert System , can then
be tested / verified with what jobseeker has himself " clicked "
Page 91
Reiteration
So each new incoming resume would change
the " prior probability ' ( again and again ), for each Industry /
Function / Designation / Edu Quali / Exp etc
Graph : X Axis > Cumulative No of
Resumes Y Axis > Probability No ( 0
to 1 )
Page 92
With 65,000 resumes, ( ie; Patients ), and
13,000 keywords ( symptoms ) , we could get a fairly accurate " estimate
" of " prior probabilities "
This will keep improving / converging ( see
hand-drawn graph on page 91 ) as resume database and keywords database keeps
growing ( especially if we can succeed in downloading thousands of resumes ( or
job advts ) , from Naukri / Monster / JobsAhead etc
Page 93
In Indian resumes , keyword " Birth
date " would have the probability of 0.99999
Of course, most such keywords are of no
interest to us !
Page 95
For our purpose, " keywords " are
all " items of evidence "
If each and every keyword found in a
incoming resume , corresponds to our " hypothesis " ( viz: keywords A
/ B / C / , are always present in resumes belonging to ' Auto ' industry ),
then we obtain max possible " Posterior Probability "
So , if our knowledge base ( not only of
keywords , but phrases / sentences / names of current and past employers /
posting cities etc ) is VERY WIDE and VERY DEEP , we would be able to formulate
more accurate hypothesis and obtain higher " posterior probability "
Page 96
So , the key issue is to write down a
" Set of hypothesis "
Page 97
Let us say , keyword " Derivative
" may have a very low frequency of occurrence in 65,000 resumes ( of all
industries put together ) but, it could be a very important keyword for the
" Financial Services " industry
Page 98
Eg: certain keywords are often associated (
found in ) with certain " Industries " or certain ' Functions "
( Domain keywords )
Page 99
With each incoming resume , the probability
of each keyword ( in keyword database )
will keep changing
Eg:
Does " Edu Quali " have any role
/ effect /ightage , in the selection of a candidate ?
Eg :
What is the max age ( or Min Exp ) at which
corporate will appoint a Manager / a
General Manager / a Vice President ?
Page 103
Based on frequency distribution in existing
65,000 resumes , say , every 20th incoming resume belongs to ,
>
XYZ industry
>
ABC function
Page 104
Evidence : If an incoming resume belongs to
" Auto " industry, it would contain keywords , " car /
automobile " etc , OR Edu Quali = Diploma in Auto Engineering
Eg :
Probability of selection of an executive is
ZERO , if he is 65 years of age !
One could assign " probability of
getting short-listed " or even " probability of getting appointed
" for each AGE or for each "
YEARS OF EXPERIENCE " or for each " EDU LEVEL " etc
This is very similar to asking :
" Will this executive ( incoming
resume ) NOT get short-listed ? OR , NOT get appointed ?
Page 105
Obviously ,
# a
person ( incoming resume ) having no
experience ( fresh graduate ) will NOT get shortlisted for the position of
MANAGER ( Zero Probability )
# a
person ( incoming resume ) having NO graduate degree , will NOT get shortlisted
for the position of MANAGER ( Zero Probability )
# a
person ( incoming resume ) with less than 5 years of experience , will NOT get
shortlisted for the position of GENERAL MANAGER ( Zero Probability ),
but
will get shortlisted for the position of
SUPERVISOR ( 0.9 probability )
Page 106
So , we need to build up a database of WHO
( executive ) got shortlisted / appointed , by WHOM ( client ) and WHEN , and
WHY ( to best of our knowledge ), over the last 13 years and HOW MUCH his
" background " matched " Client Requirement " and Keywords
in resumes
Page 107
*
How " good " or " bad " is a given resume for a
given client needs ?
*
Co-relating ,
>
Search parameters used , with
>
Search results ( resumes / keywords in resumes ) ,
for each and every , online / offline
Resume Search
Page 109
* Is
this like saying : " Resume A belongs to Engineering Industry to some
degree and also to Automobile Industry to some degree ?
Same way can be said about " Functions
"
*
Same as our rating one resume as " Good " and another as
" Bad " ( of course, in relation to client's stated needs )
Page 110
Eg: SET A
> Resumes belonging to Eng
Industry
SET B > Resumes belonging to Auto Industry
A resume can belong to both the sets with
" different degree of membership "
Page 114
In our case , we have to make a "
discreet decision " , viz; " Shall I shortlist & forward this
resume to client or not ?
Page 115
* A given keyword is present in a resume ,
then treat the resume as belonging to " Eng Ind "
Or ( better ) , a
" given combination " of keywords being present or absent in a resume
should be classified under ( say ) , " Engg Ind "
*
For us , " real world experience " = several sets of "
keywords " discovered in 65,000 resumes which are already categorized ( by
human experts ) as belonging to Industry or Function , A or B or C
Page 118
Eg : A given resume is a " very close match " / " close
match " / " fairly close match " / " not a very close match
" ,
with client's requirement
Page 127
*
Who knows , what " new / different " keywords would appear ( -
and what will disappear ) , in a given type of resumes , over next 5 / 10 years
?
* We
may think of our Corporate Database in these terms and call it " Corporate
Knowledgebase "
Page 129
*
One could replace " Colour of Teeth " = Designation Level ( MD
/ CEO / President / GM )
*
" Data Structure " table with following columns :
> Keywords
> Industry
> Function
> Designation
> Education
> Age
> Experience
> Current Employer
> Skills
> Attitudes
> Attributes
Page 132
Is this somewhat like saying :
" This ( given ) resume belongs to
Industry X or Industry Y ?
Number of times a given resume got
shortlisted in years , X / X+1 / X+2 / etc
If we treat " getting shortlisted
" as a measure of " Success " , ( which is as close to defining
" success " as we can ), =
prize money won
Of course , in our case , " No of
times getting shortlisted " ( in any given year 0 , is a function of ,
>
Industry background
>
Function background
>
Designation
>
Edu
>
Age ( Max )
>
Exp ( Min )
>
Skills
>
Knowledge etc
Page 133
Which is what the resumes are made up of (
- sentences )
See my notes on ARDIS and ARGIS
Page 136
This was written 13 years ago . In last few
months, scientists have implanted simple micro sensors in human body and
succeeded in " integrating " these ( chips ) into human nervous
system !
Eg : restoring " vision " thru
chip implant ( Macular Degeneration of Retina )
Page 143
In ARDIS , I have talked about character recognition
/ word recognition / sentence - phrase recognition
Page 148
" You see what you want to see and
hear what you want to hear "
I have no doubt these ( Object oriented
Language ) must have made great strides in last 20 years
Page 149
*
Are our 13,000 keywords , the " Working Memory " ?
* Eg
: Frequency distribution of keywords belonging to ( say ) 1,000 resumes
belonging to " Pharma Industry " , is a " pattern "
When a new resume arrives, the software
" invokes " this " pattern " to check the amount / degree
of " MATCH "
Page 150
Eg : This is like starting with an
assumption ( hypothesis ) that the next incoming resume belongs to " Auto
Industry " or to " Sales
" function , and then proceed to prove / disprove it
Page 151
Keywords pertaining to any given "
Industry " or " Function " will go on changing over the years ,
as new skills and knowledge gets added so " recent ' keywords are more
valid
Eg : " Balanced Score Card " was
totally unknown 2 years ago !
Page 152
In case of " keywords " , is this
comparable to ordering ( ie ; arranging ) by frequency of occurrence ?
Page 153
*
Treating child as an " Expert System " , capable of drawing
inference
*
Eg: blocks different colours
*
Rules will remain same
*
Keywords will change over time ( working memory )
*
Gross ( or crude ) search , to be refined later on
Page 154
*
Obtaining an understanding of what the system is trying to achieve
* I
suppose this has already happened
Page 155
* See my notes on " Words and
Sentences that follow "
Page 178
* We
are thinking in terms of past " Resumes " which have been "
short-listed"
* I
have already written some rules
*
This must have happened in last 13 years !
Page 179
We must question our consultants as to what
logic / rules do they use / apply ( even subconsciously ) - to short-list or
rather select from shortlisted candidates
Page 181
" KNOWLEDGE ACQUISITION
TOOLS " , developed by us over the years :
*
Highlighter
*
Eliminator
*
Refiner
*
Compiler
*
Educator
* Composer
*
Matchmaker
*
Member Segregator
*
Mapper ( to be developed )
Page 186
I have written some rules but many more
need to be written
Page 191
Like " weightages" in Neural
Networks ?
To find " real evidence " , take
resumes ( keywords ) AND " Interview Assessment Sheets " of "
Successful " candidates and find co-relation
Eg : Rules re " Designation Level
" could conflict with rules re: " Experience ( years ) " or
" Age ( Min ) "
Page 192
Special Rule > Eg: Edu Quali = CA / ICWA / MBA , for
" Finance " function
General Rule > Basic degree in commerce
" Rule Sets " on Age ( max ) /
Exp ( min ) / Edu Quali / Industry / Function / Designation level ..etc
If our Corporate Client can explicitly
state " weightage " for each of the above , it would simplify the design of expert system
Mr Nagle had actually built this (
weightage ) factor in our HH3P search
engine , way back in 1992
Instead of clients , consultants entered
these " weightages " in search
engine ; but resume database was too small ( may be < 5,000 )
Page 197
Quacks sometimes do " cure " a
disease , but we would never " know " why or how !
Page 198
We have databases of :
*
Successful ( or " failed " ) candidates , and
*
their " Resumes " and " Assessment Sheets " , and
*
Keywords contained in these resumes
Page 199
*
" Not Fine " = Failed candidates
*
Fine = Successful candidates (
from historical data )
* We
will need to define which are " successful " and which are "
failure " candidates ( of course , in relation to a given vacancy )
* We
want to be able to predict which candidates are likely to be " successful
"
* We
must have data on past 500 " successful " and 5000 " failure " candidates in our
database
Page 222
Abhi :
This is exactly what our " Knowledge
Tools " do , viz:
*
Highlighter
*
Eliminator
*
Refiner
*
Sorter
*
Matchmaker
*
Educator
*
Compiler
*
Member Segregator etc
Page 227
Diebold predicted such a factory in his
1958 book " Automatic Factory
"
Someday , we would modify our OES ( Order Execution
System ) in such a way that our clients will " Self Service "
themselves
" SiVA " on JobStreet.com has
elements of such a self-serving system
Page 232
Statistical Analysis of thousands of job
advts of last 5 years , could help us extrapolate next 10 year's trend
Page 236
We are planning a direct link from HR
Managers to our OES ( which is our factory floor )
We will still need " consultants
" - but only to interact ( talk ) with clients & candidates ; not to
fill in INPUT screens of OES !
Page 237
Obviously , author had a vision of Internet
- which , I could envision 2 years earlier in 1987 , in my report "
QUO VADIS "
Page 239
Norbert Weiner had predicted this in 1958
--------------------------------------------------------------------------------------------------------------------------------------------------------------------------
No comments:
Post a Comment