Extending Propositional Satisfiability to Determine Minimal Fuzzy

Report
Extending Propositional Satisfiability to
Determine Minimal Fuzzy-Rough
Reducts
Richard Jensen
Aberystwyth University, UK
Andrew Tuson
City University, UK
Qiang Shen
Aberystwyth University, UK
Richard Jensen, Andrew Tuson and Qiang Shen
Outline
• The importance of feature selection
• Rough set theory
• Fuzzy-rough feature selection (FRFS)
• FRFS-SAT
• Experimentation
• Conclusion
Richard Jensen, Andrew Tuson and Qiang Shen
Feature selection
• Why dimensionality reduction/feature selection?
High dimensional
data
Dimensionality
Reduction
Intractable
Low dimensional
data
Processing System
• Growth of information - need to manage this effectively
• Curse of dimensionality - a problem for machine learning
Richard Jensen, Andrew Tuson and Qiang Shen
Rough set theory
Upper
Approximation
Set A
Lower
Approximation
Equivalence
class Rx
Rx is the set of all points that are indiscernible
with point x in terms of feature subset B
Richard Jensen, Andrew Tuson and Qiang Shen
Discernibility approach
• Decision-relative discernibility matrix
• Compare objects
• Examine attribute values
• For attributes that differ:
cij  {a  C | a( xi )  a( x j ), d ( xi )  d ( x j )}
• If decision values differ, include attributes in matrix
• Else leave slot blank
• Construct discernibility function:
fC(a1,...,am)  {(cij ) :1  j  i | U |, cij  }
Richard Jensen, Andrew Tuson and Qiang Shen
Example
• Remove duplicates
fC(a,b,c,d) =
{a ⋁ b ⋁ c ⋁ d} ⋀ {a ⋁ c ⋁ d} ⋀
{b ⋁ c} ⋀ {d} ⋀ {a ⋁ b ⋁ c} ⋀
{a ⋁ b ⋁ d} ⋀ {b ⋁ c ⋁ d} ⋀ {a ⋁ d}
• Remove supersets
fC(a,b,c,d) = {b ⋁ c} ⋀ {d}
Richard Jensen, Andrew Tuson and Qiang Shen
Finding reducts
• Usually too expensive to search exhaustively for
reducts with minimal cardinality
• Reducts found through:
• Converting from CNF to DNF (expensive)
• Hill-climbing search using clauses (non-optimal)
• Other search methods - GAs etc (non-optimal)
• RSAR-SAT
• Solve directly in SAT formulation.
• DPLL approach is both fast and ensures optimal reducts
Richard Jensen, Andrew Tuson and Qiang Shen
Fuzzy discernibility matrices
• Extension of crisp approach
• Previously, attributes had {0,1} membership to clauses
• Now have membership in [0,1]
• Allows real-coded data as well as nominal.
• Fuzzy DMs can be used to find fuzzy-rough
reducts
Richard Jensen, Andrew Tuson and Qiang Shen
Formulation
• Fuzzy satisfiability
• In crisp SAT, a clause is fully satisfied if at least
one variable in the clause has been set to true
• For the fuzzy case, clauses may be satisfied to a
certain degree depending on which variables
have been assigned the value true
Richard Jensen, Andrew Tuson and Qiang Shen
Experimentation: setup
• 9 benchmark datasets
• Features – 10 to 39
• Objects – 120 to 690
• Methods used:
• FRFS-SAT
• Greedy hill-climbing: fuzzy dependency, fuzzy boundary region and
fuzzy discernibility.
• Evolutionary algorithms: genetic algorithms (GA) and particle swarm
optimization (PSO) using fuzzy dependency
• 10x10-fold cross validation
• FS performed on the training folds, test folds reduced using
discovered reducts
Richard Jensen, Andrew Tuson and Qiang Shen
Experimentation: results
Richard Jensen, Andrew Tuson and Qiang Shen
Conclusion
• Extended propositional satisfiability to enable
search for fuzzy-rough reducts
• New framework for fuzzy satisfiability
• New DPLL algorithm
• Fuzzy clause simplification
• Future work:
•
•
•
•
Non-chronological backtracking
Better heuristics
Unsupervised FS
Other extensions in propositional satisfiability
Richard Jensen, Andrew Tuson and Qiang Shen
• WEKA implementations of all fuzzy-rough
feature selectors and classifiers can be
downloaded from:
Richard Jensen, Andrew Tuson and Qiang Shen
Feature selection
• Feature selection (FS) is a DR technique that
preserves data semantics (meaning of data)
Generation
Evaluation
Stopping
Criterion
Validation
• Subset generation: forwards, backwards, random…
• Evaluation function: determines ‘goodness’ of subsets
• Stopping criterion: decide when to stop subset search
Richard Jensen, Andrew Tuson and Qiang Shen
Algorithm
Richard Jensen, Andrew Tuson and Qiang Shen
Example
Richard Jensen, Andrew Tuson and Qiang Shen

similar documents