0.4.0
New features and bug fixes:
- Allow to specify the formula to compute the text features bin size for
RawFeatureFilter(seeRawFeatureFilter.textBinsFormulaargument) #99 - Fixed metadata on
GeolocationandGeolocationMapso that keep the name of the column in descriptorValue. #100 - Local scoring (aka Sparkless) using Aardpfark. This enables loading and scoring models without Spark context but locally using Aardpfark (PFA for Spark) and Hadrian libraries instead. This allows orders of magnitude faster scoring times compared to Spark. #41
- Add distributions calculated in
RawFeatureFiltertoModelInsights#103 - Added binary sequence transformer & estimator:
BinarySequenceTransformerandBinarySequenceEstimator+ plus the associated base traits #84 - Added
StringIndexerHandleInvalid.Keepoption intoOpStringIndexer(same as in underlying Spark estimator) #93 - Allow numbers and underscores in feature names #92
- Stable key order for map vectorizers #88
- Keep raw feature distributions calculated in raw feature filter #76
- Transmogrify to use smart text vectorizer for text types:
Text,TextArea,TextMapandTextAreaMap#63 - Transmogrify circular date representations for date feature types:
Date,DateTime,DateMapandDateTimeMap#100 - Improved test coverage for utils and other modules #50, #53, #67, #69, #70, #71, #72, #73
- Match feature type map hierarchy with regular feature types #49
- Redundant and deadlock-prone end listener removal #52
- OS-neutral filesystem path creation #51
- Make Feature class public instead hide it's ctor #45
- Specify categorical variables in metadata #120
- Fix fill geo location vectorizer values #132
- Adding feature importance for new model types #128
- Adding binaryclassification bin score evaluator #119
- Apply DateToUnitCircleTransformer logic in raw feature filter transformations 130#
Breaking changes:
- Made case class to deal with model selector metadata #39
- Made
FileOutputCommitera default and got rid ofDirectMapreduceOutputCommitterandDirectOutputCommitter#86 - Refactored
OpVectorColumnMetadatato allow numeric column descriptors #89 - Renaming
JaccardDistancetoJaccardSimilarity#80 - New model selector interface #55. The breaking changes are related to return type and the way the parameters are passed into model selectors. Starting this version model selectors would return a single result feature of type
Prediction(instead of a variable number of feature -(pred, raw, prob)). Example:
val (pred, raw, prob) = MultiClassificationModelSelector() // won't compile anymore
val prediction = MultiClassificationModelSelector() // ok!Another change is the way parameters are passed into model selectors. Example:
BinaryClassificationModelSelector
.withCrossValidation()
.setLogisticRegressionRegParam(0.05, 0.1) // won't compile anymoreInstead one should do:
val lr = new OpLogisticRegression()
val models = Seq(lr -> new ParamGridBuilder().addGrid(lr.regParam, Array(0.05, 0.1)).build())
BinaryClassificationModelSelector
.withCrossValidation(modelsAndParameters = models)For more example on how to use new model selectors please refer to our documentation and helloworld examples.
Dependency upgrades & misc:
- CI/CD runtime improvements for CircleCI and TravisCI
- Updated Gradle to 4.10
- Updated
scala-graphto1.12.5 - Updated
scalafmtto1.5.1 - New
transmogrifai-localsubproject #41 introducesaardpfarkandhadriandependencies.