Release History¶

Scrybe is an automated way to keep track of all your data science artifacts in one place. Register by navigating to web.scrybe.ml.

0.2.10¶

Updates:¶

Client:
- Bugfix for discontinues graph when Pandas Dataframe’s column is set

0.2.9¶

Updates:¶

Client:
- CatBoost models are now supported by Scrybe
- You can now use scrybe.peek API to get a quick glance of your model, plot figure or dataset tracked by Scrybe in your Jupyter Notebook.
- For all tracked models and plots, Scrybe outputs a URL for the same of the dashboard so that it can be easily accessed and commented upon
Dashboard
- The project info shows a summary of selected metrics across all the models. These can be grouped by model algorithms and users who created them. Helps in keeping track of the progress in the project. You can click on a point in the plot to jump to model details.
- Name autocompletion in comments using “@”
- Parallel coordinates plots for hyperparameter visualization in case of grid search

0.2.8¶

Updates:¶

Client:
- Users can now pass model id instead of the model object in log_custom_model_evaluation_metric
Dashboard:
- Users can now copy model id from model details page

0.2.7¶

Updates:¶

Client:
- All datasets and models created by Pyspark are getting tracked
- IPython code will automatically get captured when model.fit is called
- API to log features
- API to log code files. The pip package information is automatically captured when you upload code files
- Plot legends are now getting captured
- Documentation for Scrybe API has been added in the code itself now. So, help(scrybe.<api_name>) will give the documentation and sample usage
Dashboard:
- Users can create reports for the projects. The items in the report can also be picked from some other project.
- Project details page has changed. Users can now look at all the metrics and hyperparameters in the table
- Users will now get notifications when someone adds a comment on their artifacts
- Plot title will appear as the plot name on the dashboard
- All models are public by default
- Users can now see model and artifacts created by all the users at once
- Users can now see the code which was used to build the model (if you upload the desired files or automatically if you are using IPython like jupyter notebooks)
- Users can also see all the pip packages that were used when model was built

0.2.6¶

Updates:¶

Bug fixes.

0.2.5¶

Updates:¶

Client:

All plots are now available on scrybe dashboard in plots tab

class_weight for Keras model is now captured as a hyperparameter

Dashboard:

Plots tab contains all the plots which can be filtered by user, dataset and tags

Plot title, xlabels etc are automatically added as tags

Plots can be bookmarked from the dashboard as well as the scrybe client API

Clicking on a feature on model detail/comparison page will open a new tab with all plots which are tagged by the selected feature name

0.2.4¶

Updates:¶

Client:
- API to bookmark model, dataset or plot object tracked by Scrybe
- API to log feature importance
- Support for lightgbm library
- pandas.concat and pandas.merge lineage issue has been fixed
Dashboard:
- Experiments and datasets tab can now be filtered by bookmarks
- Model comparison now also shows comparison between features and feature importances

0.2.3¶

Updates:¶

Client:
- Automatically calculates feature importance after .fit call if “feature_importances” property is present in the estimator using the default parameters
- Users can upload their own feature importance dictionary by calling scrybe.log_feature_importances
- Scrybe can now track the loading and saving of datasets using h5py, pandas (read_csv, to_csv, to_pickle)
- Support for scikit-learn==0.22
Dashboard:
- Source code of how the dataset was created is now available
- Model detail page will show the features and feature importance if the data is available
- Plots on model predictions (and derived datasets) are now visible in Model detail page
- Experiments tab can now be filtered using features that went into building the model

Unsupported:¶

Path for Pandas datasets loaded using joblib.load will not show up in Datasets tab
pandas.concat and pandas.merge results in discontinuous lineage
Tracking of datasets created by Groupby operations on dataframe and series is currently not supported

0.2.2¶

Updates:¶

Client:
- Support for aggregation of models built using RandomizedSearchCV. All models will be captured but only the best estimator will be shown in the Experiments tab. Rest of the models will be accessible from the model detail page of the best estimator
- API to add labels from code. The labels can either be strings or list of strings. When a label is set, all the datasets, models and artifacts that are built will automatically get tagged using this label. On the dashboard, you can filter the models, datasets or metrics using this label.

0.2.1¶

Updates:¶

Client:
- Support for automatic upload of statistics when calculated using the following function:
- Dataframe/Series: (describe, sem, var, std, mean, skew, kurt, median, max, min, ptp, count, nunique, quantile, unique, value_counts, corr, cov, corrwith)
- Numpy: (quantile, cov, corrcoef, unique(if return_counts=True))
- API to log custom dataset stats
- Faster uploading for plots
Dashboard:
- Dataset details page
- Shows dataset info, statistics and plot grouped by dataset
- You can now click on a model’s training dataset in graph to see what features were used and what was the size of the dataset.
- Commenting on model detail page

Unsupported:¶

All statistical functions are not automatically tracked right now but the most commonly used ones will be added as per user request.

Statistics for datasets with less than 100 rows will not get uploaded.

0.2.0¶

Updates:¶

Client:
- XGBoost model evaluation metrics are being captured
- Support for Python List, Dictionary and Tuple:
- Inputs sent as list/dict/tuple of Numpy arrays or pandas dataframes will be correctly tracked
- Plots created using seaborn, matplotlib or pandas.plotting will be captured automatically
Bugfix:
- Removed verbose print statements
Dashboard:
- Datasets tab and dataset details page
- Shows a list of persisted datasets used in current project
- Click-through takes you to dataset details page containing lineage graph and all plots associated with this dataset
- Model lineage graph added to model details page

Unsupported:¶

List/dictionary/tuples should contain numpy arrays or pandas dataframes
List of floats/ints will not be tracked automatically

Known Issues:¶

The dataset/model lineage might be discontinuous when using n_job > 1 in grid search (i.e. parallelism)
The dataset/model lineage will be discontinuous if pandas.concat is used