Releases: datahub-project/datahub
DataHub v0.8.26
This is a Bugfix release meant to address the issue with adding Glossary Terms to Dataset fields present in version 0.8.25.
Release Highlights
- Fixing bug where Glossary Terms cannot be added to Dataset fields in previous release version.
DataHub v0.8.25
Known Issues
- Adding Glossary Terms to schema fields does not work with this version due to a bug. Upgrade to v0.8.26 for the fix.
Release Highlights
Buckle up, folks! v0.8.25 brings some very exciting (and highly-requested!) updates.
Notable UI-Based Features
- UI-based Ingestion - as demoed in December Town Hall, we now support creating, configuring, scheduling, & executing batch metadata ingestion using the DataHub user interface. This makes getting metadata into DataHub easier by minimizing the overhead required to operate custom integration pipelines.
- Data Domains - DataHub now supports grouping data assets into logical collections called Domains. Domains are curated, top-level folders or categories where related assets can be explicitly grouped. Read the guide here!
- Data Containers are now supported! This is the physical grouping of entities, ex. a Schema is a container of 1 or more Datasets; a Dashboard is a container of 1 or more Charts.
Notable Metadata Model & Ingestion-Based Features
- Data Quality test results are now supported in the DataHub metadata model. This is the first milestone toward surfacing Dataset & Column-level Data Quality results in the UI (read full scope of work here). Future releases will include a Great Expectations integration & UI support - we’re on track to complete this in Q1 as planned.
- Avro files are now supported in the Data Lake File ingestion source
- Ingest metadata from multiple instances of the same platform type. This has been a very common use case within the Community - you can now differentiate multiple instances of the same platform type! If you already have pre-existing entries, use the
datahub
migrate command to migrate them over to platform instances. - Ignore users from Top Users calculation
- BigQuery - Data Profiling on only the latest partition/shard
- (feat)(Business Glossary) add tabular schema and new UI for business glossary by @saxo-lalrishav in #3813
Notable Fixes
- Fix to support
View in Looker
* feat(looker): Adding optional Looker external url base url config by @jjoyce0510 in #3985 - fix(graphql): support group display name in ownership by @thomasplarsson in #3979
- fix(profiling): Enabling profiling for low cardinality number columns by @treff7es in #3990
- fix(ingestion): match default username for Azure OIDC and Azure ingestion source by @iasoon in #3926
DataHub Usage Guides
- docs(domains): Adding a User Guide for Domains by @jjoyce0510 in #4038
- docs(ingest): Adding UI ingestion guide by @jjoyce0510 in #4048
What's Changed
- fix(vulnerability): Upgrade gms base image by @dexter-mh-lee in #3962
- logging(frontend): Improve OIDC debug logs by @jjoyce0510 in #3967
- docs(delete): add curl request example to delete entity by @anshbansal in #3928
- fix(ingestion): match default username for Azure OIDC and Azure ingestion source by @iasoon in #3926
- Feature/dynamic platform icons by @RyanHolstien in #3968
- refactor(ingestion): remove duplicate aspect type by @hsheth2 in #3972
- fix(example): fix typo by @anshbansal in #3907
- fix(ingestion): Restrict python to <=3.9.9 by @treff7es in #3961
- feat(build): remove requirement for git directory for builds by @swaroopjagadish in #3977
- fix(ingestion): tighten conditions for restli json transformation by @hsheth2 in #3973
- fix(ingestion): don't dump variables for config errors by @hsheth2 in #3974
- Bugfix/increase socket timeout by @RyanHolstien in #3982
- feat(ingest): support for Avro data lake files by @kevinhu in #3913
- fix(build): exclude old log4j core by @RickardCardell in #3966
- fix(quickstart): Pin Quickstart version to v0.8.23. by @jjoyce0510 in #3983
- feat(looker): Adding optional Looker external url base url config by @jjoyce0510 in #3985
- fix(graphql): support group display name in ownership by @thomasplarsson in #3979
- fix(quickstart): Assign correct mysql-setup container for M1 and remove "head" default version. by @jjoyce0510 in #3987
- feat(embedded search results): support custom endpoints in embedded search result by @gabe-lyons in #3986
- fix(docker): datahub-gms - build in native, copy to target by @swaroopjagadish in #3992
- fix(ci): moving defaults back to head now that docker builds are green by @swaroopjagadish in #3993
- feat(ui): UI-based ingestion (as featured in Dec Townhall) by @jjoyce0510 in #3975
- quickstart: Adding UI ingestion to quickstart YAML by @jjoyce0510 in #3994
- feat(domains): Adding backend for Asset Domains (p1) by @jjoyce0510 in #3952
- Bug: a bug fix to bigquery_to_datahub.yml file by @dipeshmaurya in #3988
- fix(ingest): check if feature data type is present by @maaaikoool in #3932
- feat(platform-instance): a simple client-only change to support platf… by @swaroopjagadish in #3996
- docs(metadata-model): Adding to Metadata model docs by @jjoyce0510 in #3998
- Add Stash Logo & new Source Icons by @maggiehays in #4002
- feat(domains): UI for Asset Domains (p2) by @jjoyce0510 in #3995
- docs: add missing back tick for metadata-ingestion/README.md by @nickwu241 in #4003
- Bugfix/add missing classes by @RyanHolstien in #4000
- fix(superset): fix connection for redshift by @anshbansal in #3944
- fix(setup): fix setup for M1 by @anshbansal in #3958
- docs:add Optum logo by @maggiehays in #4005
- Refining Metadata Model docs further by @jjoyce0510 in #4001
- fix(docker): Alpine based multiplatform docker build for kafka-setup by @treff7es in #3991
- Bugfix/graph concurrency issue by @RyanHolstien in #4007
- feat(ingest): Add additional snowflake auth by @MikeSchlosser16 in #4009
- fix(ci): Reverting unnecessary domain test changes by @jjoyce0510 in #4013
- fix(metrics): Add metrics for mcl hooks by @dexter-mh-lee in #4008
- feat(platform) - Update FabricType enum to represent more fabrics by @aditya-radhakrishnan in #3997
- feat(ingest): emit flags and stats for profiling telemetry by @kevinhu in #3969
- fix(formatting): fix linting lib version requirement by @anshbansal in #3939
- fix(docs): fix business glossary docs by @anshbansal in #3916
- fix(profiling): Enabling profiling for low cardinality number columns by @treff7es in #3990
- fix(docs): update gms link by @lhvubtqn in #3927
- fix(ingest): lint fix a few files by @swaroopjagadish in #4016
- fix(ingest): adding platform instance urn to data platform instance aspects by @swaroopjagadish in #4015
- feat(ingest): use trino python client for sqlalchemy, supports python… by @mayurinehate in #3888
- fix(spark-lineage): select mock server port dynamically for unit test by @MugdhaHardikar-GSLab in #4018
- (feat)(Business Glossary) add tabular schema and new UI for business glossary by @saxo-lalrishav in #3813
- Test/add concurrency issue smoke test by @RyanHolstien in #4014
- feat(glossary-terms): Index glossary term custom properties by @jjoyce0510 in #3960
- feat(ingestion): Adding ability to ignore users from top users...
DataHub v0.8.24
Release Highlights
- Adding support for nested Glue schemas
- Adding Data Lake Files ingestion source to support data profiling for local files and files stored in AWS S3; supported file types are CSV, TSV, Parquet, and JSON
- Improvements to readability in UI to format large numbers, including: adding thousands separators & rounding large numbers to millions with raw value available via tooltip
- Miscellaneous bug fixes & improvements
What's Changed
- fix(workflow) docker-ingestion is failing bc of an invalid sed command by @dexter-mh-lee in #3896
- refactor(graphql): Migrating Datasets, Charts, Dashboards, Jobs, Flows to Entity V2 endpoint by @jjoyce0510 in #3897
- fix(ingest): populate system metadata for all metadata events (mcp, mcpw) by @swaroopjagadish in #3900
- perf: add/change scripts for tests by @anshbansal in #3840
- fix(glossary): owner should be optional as per docs by @anshbansal in #3858
- feat(ingestion): Support for nested glue schemas by @rslanka in #3895
- docs: change roadmap link by @jeffmerrick in #3904
- feat(kafka): support confluent references by @anshbansal in #3862
- docs (elasticsearch): config error by @JIWEI0 in #3901
- feat(ingestion): Data lake profiling by @kevinhu in #3656
- refactor(search): refactor NUM_RETRIES in esindexbuilder to be configurable by @senni0418 in #3870
- fix(ingest): nifi - replace hardcode password with config variable by @lhvubtqn in #3902
- feat(authentication): propagate expired token exceptions to end user by @gabe-lyons in #3894
- fix(docs): update data lake docs with path_spec details by @kevinhu in #3905
- ci(smoke-test): make tags&terms smoke test wait for ingestion to complete by @gabe-lyons in #3812
- Revert "fix(glossary): owner should be optional as per docs (#3858)" by @anshbansal in #3910
- fix(ingest): operational stats - check if optional fields are present by @aditya-radhakrishnan in #3911
- fix(typo): fix typo in docs by @anshbansal in #3908
- refactor(gql/ui): Misc refactorings by @jjoyce0510 in #3921
- feat(config): make check for frontend instead of gms more robust by @anshbansal in #3919
- feat(spark-lineage): simplified jars, config, auto publish to maven by @swaroopjagadish in #3924
- Bugfix/telemetry soft fail by @RyanHolstien in #3934
- fix(log): fix log levels and formats by @anshbansal in #3943
- docs(metadata-ingestion): fix command for running fast unit tests by @anshbansal in #3942
- fix(ui): update login title css to fit on one line by @aditya-radhakrishnan in #3922
- fix(docs): Clarify available no-code rendering formats in DataQualityRules.pdl by @gabe-lyons in #3912
- docs(links): add links to some recent case studies and blog posts by @anshbansal in #3941
- fix(docs): fix openapi docs by @anshbansal in #3940
- Adding Snappy Lib and JKS File by @arunvasudevan in #3898
- Feature/Issue resolved- Improve table stats readability in UI by @ShubhamThakre in #3889
- refactor(ui): Allow DocumentationTab to optionally use updateDescription mutation by @jjoyce0510 in #3935
- (docs)add moloco logo by @maggiehays in #3945
- refactor(bootstrap data): Add usage and profiles to bootstrap_mce.json by @jjoyce0510 in #3947
- docs(metadata): update relationship query in docs by @gabe-lyons in #3951
- fix(ingestion): Snowflake Usage should continue to emit usage workunits with include_operational_stats enabled. by @rslanka in #3949
- feat(ingestion): Add support for extracting S3->Snowflake and S3->Glue lineages. by @rslanka in #3946
- fix(graphQL): Fixing set ordering in batchGet of entities by @jjoyce0510 in #3950
- feat(elastic-search): changing default bulk index request batch to 1000 by @swaroopjagadish in #3957
- docs (metadata modeling): Fix broken links and doc fixes by @arunvasudevan in #3954
New Contributors
- @JIWEI0 made their first contribution in #3901
- @senni0418 made their first contribution in #3870
- @lhvubtqn made their first contribution in #3902
- @ShubhamThakre made their first contribution in #3889
Full Changelog: v0.8.23...v0.8.24
DataHub v0.8.23
Release Highlights
- Fix critical Dashboard / Charts bug from 0.8.22, where Chart inputs were not being ingested successfully.
- Adding currently deployed version to the UI (under top-right dropdown menu). Also available via the GMS /config endpoint.
- Robustness improvements to DataHub Java Client Package
- Introducing a new Elasticsearch ingestion connector!
- Misc bug fixes & improvements.
What's Changed
- build: include correct version in metadata-ingestion docker image by @hsheth2 in #3857
- fix(metabase): fix crashes on missing values by @iasoon in #3859
- fix(datahub-client): fix shadow jar build, correct spark-lineage url … by @swaroopjagadish in #3871
- feat(git-version): Add version to the UI and config endpoint by @dexter-mh-lee in #3866
- fix(build): fix shadow jar checker to allow new git.properties by @swaroopjagadish in #3875
- feat(metadata-ingestion): Make datahub-rest client more robust by configurable retries. (#3826) by @RickardCardell in #3860
- fix(github-workflow): Remove duplicate context in kafka setup workflow by @dexter-mh-lee in #3876
- docs(azure-ad): correct default value for username attr by @iasoon in #3861
- docs: fix endpoint URL by @anshbansal in #3852
- fix(cli): disable telemetry in CLI tests by @kevinhu in #3877
- feat(metabase): allow configuring how database engines get mapped to platforms by @iasoon in #3869
- doc(graphql): add some examples by @anshbansal in #3867
- fix(search): Fix issue with filters and autocomplete by @dexter-mh-lee in #3868
- fix(build): remove jcenter from gradle build by @aditya-radhakrishnan in #3882
- (docs)Roadmap, Townhall, & Feature Request link updates by @maggiehays in #3873
- doc(kafka): add permissions required for confluent cloud by @anshbansal in #3850
- feat(ingest): ingestion-specific telemetry by @kevinhu in #3881
- Add AWS MSK Iam Auth Jar to GMS by @arunvasudevan in #3872
- docs(ingestion) azure: specify required permission type by @iasoon in #3886
- feat(ingestion) dbt: support spark sql types by @iasoon in #3880
- update dependency for bigquery. by @varunbharill in #3874
- fix(field-extraction): Fix extraction for unions by @dexter-mh-lee in #3892
- fix(ingest): sqlparser - Not lowercasing looker source's special table name by @treff7es in #3891
- feat(ingest): Support for spectrum external array types by @treff7es in #3890
- feat(Ingestion): Add Elasticsearch Source by @rslanka in #3893
Full Changelog: v0.8.22...v0.8.23
DataHub v0.8.22
Disclaimers!
- Ingesting Chart Inputs was broken in a PR that got into this release. This will be fixed in v0.8.23. If you plan to ingest Charts / Dashboards, we recommend skipping this version and upgrading to v0.8.23 directly.
Release Highlights:
- Support for mapping DBT meta properties of a dataset to metadata operations, such as add_owner, add_term, add_tag etc.
- Java REST emitter library to programmatically generate metadata events from Java-based clients such as from Spark jobs.
- Data freshness indication via Last Updated Timestamp.
- Improvements to data profiling performance and lineage extraction
What's Changed
- feat(snowflake-usage): Generate email address if not exists by @treff7es in #3791
- feat(java datahub-client): add Java REST emitter by @MugdhaHardikar-GSLab in #3781
- fix(docker): Fix path to elastic definition in dev docker compose by @MikeSchlosser16 in #3808
- feat(nocode): Add get entities v2 endpoint that can get without snapshot by @dexter-mh-lee in #3738
- docs(modeling): Add a link to MXE page inside the Metadata Modeling page by @pramodbiligiri in #3765
- docs(fix): fix broken reference by @RyanHolstien in #3814
- feat(java-emitter): improvements to builder API-s, moving spark-linea… by @swaroopjagadish in #3819
- fix(ingestion): Make url an optional field of the DefaultConfig for business glossary by @rslanka in #3817
- fix(ingest): Handle string redshift type by @treff7es in #3811
- feat(gms): add schema registry support for tls in gms by @MikeSchlosser16 in #3804
- Add table, changed formatting and wording by @dannylee8 in #3802
- feat(mae/mcl): Make ingestAspect produce both MCLs and MAEs by @dexter-mh-lee in #3737
- docs(confluent): Add new topic names by @anshbansal in #3825
- (feat)(glossary): Increase number of autocomplete results shown to 25 by @aditya-radhakrishnan in #3821
- feat(sql-parser): Replacing sqlmetadata sql parser lib with sqlineage parser lib by @treff7es in #3806
- feat(profiler): using approximate queries for profiling by @treff7es in #3752
- docs: improve docs for kafka configuration by @abiwill in #3828
- test(fixEbeanEntityServiceTest): fix bug on verification for EbeanEntityService by @RyanHolstien in #3829
- fix(ingest): ignore custom connectors for Glue ingestion by @kevinhu in #3805
- fix(java-emitter): check for null callback by @swaroopjagadish in #3830
- feat(dbt-meta): add support for dbt meta mapping by @swaroopjagadish in #3832
- fix(ingestion): Fix the datetime parsing issue in the metabase source. by @rslanka in #3831
- feat(removeGMA): remove all dependencies on gma libraries by @RyanHolstien in #3835
- perf(ingest): changes to improve ingest performance a bit by @anshbansal in #3837
- fix(azure AD): fix problem with missing key causing failures in ingestion by @anshbansal in #3824
- docs: fix typo by @anshbansal in #3848
- docs(cli): fix wrong heading, add link to release notes by @anshbansal in #3700
- feat(ci): split metadata-ingestion ci to streamline build by @swaroopjagadish in #3854
- fix(dbt): fix warning due to struct type not being mapped by @anshbansal in #3846
- fix(ingest): bigquery-usage - fix remove_extras to remove all partitions by @gfalcone in #3842
- fix(ingestion): handle database=None for dbt ingestion by @iasoon in #3851
- feat(ingest): last updated - show last updated for sql usage sources by @aditya-radhakrishnan in #3845
- feat(lineage): allow for expanding of lineage node titles in the lineage explorer by @gabe-lyons in #3856
New Contributors
- @MikeSchlosser16 made their first contribution in #3808
- @pramodbiligiri made their first contribution in #3765
- @aditya-radhakrishnan made their first contribution in #3821
- @abiwill made their first contribution in #3828
- @gfalcone made their first contribution in #3842
- @iasoon made their first contribution in #3851
Full Changelog: v0.8.21...v0.8.22
v0.8.21
This release includes a fix for timeouts in reindexing of large indices that occurs when new fields are added to an index.
Release Highlights
- Getting Started Modal + Empty State: Improve the experience of having no data ingested in DataHub by providing a "Getting Started" Guide when there is no data yet ingested.
- Provide BigQuery credentials via recipe config: Previously BigQuery credentials were provided via environment variable. Going forward they can be provided directly inside the Recipe config.
- Increase re-indexing 30s timeout: Previously elastic reindexing was maxed at a 30 second synchronous timeout. This was causing some upgrades of GMS to fail. This PR increases that timeout to one hour.
What's Changed
- fix(lkml): bump lkml version up to 1.1.2 to support sql_preamble expression by @hyunminch in #3757
- fix(react-ui): fix header min height by @gabe-lyons in #3784
- docs(auth): add Microsoft Azure as an SSO provider (#3779) by @cccs-eric in #3780
- Add azure OIDC doc to sidebar by @jjoyce0510 in #3785
- feat(UI): Add "Getting Started" Modal on fresh deployment by @jjoyce0510 in #3773
- feat(transform): adds simple add dataset properties transform by @sgomezvillamor in #3778
- Update troubleshooting steps for local development with docker by @RyanHolstien in #3788
- docs(redshift): Updating Redshift permission prerequisites in doc by @treff7es in #3777
- fix(superset): fix Superset chart ingestion with an empty metric label by @cccs-eric in #3793
- doc(transforms): adds doc for simple_add_dataset_properties transformer by @sgomezvillamor in #3790
- feat(ingest): Add config option to set Bigquery credential in source config by @treff7es in #3786
- fix(elastic): allow more time for re-indexing tasks by @gabe-lyons in #3794
- docs(kafka): add example for ingestion from confluent cloud by @anshbansal in #3789
New Contributors
- @cccs-eric made their first contribution in #3780
Full Changelog: v0.8.20...v0.8.21
v0.8.20
This release includes the patch for CVE-2021-44228, pinning log4j to 0.2.17. Small bug fixes & improvements, otherwise.
Release Highlights
- Configurable aspect retention in application.yml (disabled by default)
- Metabase Ingestion Source connector
- Constrain log4j to version 0.2.17
- Upgrade logback to 1.2.9
What's Changed
- feat(spark-lineage): add ability to push data lineage from spark to d… by @MugdhaHardikar-GSLab in #3664
- feat(cli): allow to nuke without deleting data in quickstart by @anshbansal in #3655
- feat(Dgraph): Make Dgraph a proper Neo4j alternative by @EnricoMi in #3578
- feat(retention): Add retention to Local DB by @dexter-mh-lee in #3715
- feat(ingest): cleanup deprecated
datahub.integrations.airflow.*
imports by @hsheth2 in #3732 - feat(ingestion) : Add Metabase Source Connector by @jawadqu in #3602
- fix(ingest): count profiled tables separately in report by @hsheth2 in #3731
- feat(perf-test): changes for perf testing by @anshbansal in #3728
- ci(cypress): adding the foundation for cypress integration tests & some starter coverage for login, search & updates by @gabe-lyons in #3672
- (fix) Elastic search container log4j CVE-2021-44228 vulnerability by @nsbala-tw in #3733
- Revert "feat(Dgraph): Make Dgraph a proper Neo4j alternative" by @gabe-lyons in #3740
- fix(CI): Regenerate Docker Quickstart by @jjoyce0510 in #3741
- fix(DataHubGraph): changing datahub-graph to use underlying session connection. by @varunbharill in #3743
- fix(ingest): Remove unecessary isalpha check for data platforms + warnings by @jjoyce0510 in #3742
- feat(snowflake-usage): add knob for direct objects accesssed vs base objects accessed by @gabe-lyons in #3744
- fix(snowflake): support snowflake allow/deny pattern for lineage and usage by @varunbharill in #3748
- refactor(gms auth): Remove base64 decoding of token service signing key by @jjoyce0510 in #3747
- test(ingest): fix pytest warning for class starting with
Test
by @hsheth2 in #3745 - feat: enables dbt metadata files to be loaded from URIs by @sgomezvillamor in #3739
- fix(ingestion): Skipping duplicate tables from ingestion by @treff7es in #3753
- feat(Stateful Ingestion): 1/3 Stateful ingestion server changes by @rslanka in #3749
- Fix CVE-2021-44228 continued: log4j constraints to version 2.16.0 by @jjoyce0510 in #3755
- build(ingest): restrict latest mypy version by @hsheth2 in #3756
- doc: Add IOMED as a DataHub adopter by @merqurio in #3758
- docs(spark-lineage): update artifact name and version by @MugdhaHardikar-GSLab in #3760
- feat(profiler): add upper bound on combined query size by @hsheth2 in #3762
- feat(ingestion): Mode retry wait logic to avoid hitting Mode API rate limit by @jawadqu in #3761
- feat(Stateful Ingestion-2/3): Client side changes for checkpointing a source job state. by @rslanka in #3763
- refactor(test): replace
CliRunner
withrun_datahub_cmd
method by @hsheth2 in #3746 - feat(bigquery): add support for parsing exported bigquery audit logs by @hyunminch in #3680
- feat(ingest): Adding support for Elasticsearch and Clickhouse by @sudotty in #3227
- Upgrade to logback 1.2.9 to address CVE-2021-42550 by @jjoyce0510 in #3771
- fix(profiling): Disabling expensive profilers by default by @treff7es in #3759
- docs(ingestion): Add details of sensitive info handling by @anshbansal in #3767
- docs(snowflake): Adding documentation about required Snowflake Privileges by @jjoyce0510 in #3770
- Upgrade to 3rd Apache patch for log4j by @xiphl in #3772
- fix(ingestion): Fix for same schema foreign key reference by @treff7es in #3769
- fix(ingest): fix compatibility with google composer by @anshbansal in #3774
Known Issues
We've been made aware that in large deployments the re-indexing step required at boot-up time exceeds the 30 second timeout. We've since made changes to loosen this timeout limit, with these changes coming in 0.8.21.
New Contributors
- @MugdhaHardikar-GSLab made their first contribution in #3664
- @jawadqu made their first contribution in #3602
- @nsbala-tw made their first contribution in #3733
- @merqurio made their first contribution in #3758
- @hyunminch made their first contribution in #3680
- @sudotty made their first contribution in #3227
- @xiphl made their first contribution in #3772
Full Changelog: v0.8.19...v0.8.20
v0.8.19
This release is a fast followup to the more substantial 0.8.18 release addressing bugs a few folks are facing in the Community.
Release Highlights
- Fix
base64
cli command issue where some systems do not have it. - Fix usage user extraction where email domain repeated twice.
What's Changed
- fix(recommendations): don't show a
0
character when there are no suggestions by @gabe-lyons in #3720 - fix(mode): support definitions in mode query by @gabe-lyons in #3721
- fix(doc): fixing doc in datahub cli for corpuser urn. by @varunbharill in #3717
- docs(redshift): Adding svv_table privilege requirement to redshift source doc by @treff7es in #3708
- fix(profiler): Fixing division by zero in pct_unique calculation by @treff7es in #3727
- fix(ingest): get mysql geotypes properly by @treff7es in #3726
- fix(ingest): update trino source error handling in get_table_comment by @mayurinehate in #3712
- feat(ingest) Trim long sql queries in usage by @treff7es in #3725
- fix(ingestion): adds missing port to the connection bootstrap by @sgomezvillamor in #3706
- fix(ingest): add source.config.connection.schema_registry_config to SchemaRegistryClient creation by @lvicentesanchez in #3702
- fix(docker): Fix issues with base64 not working on some platforms by @dexter-mh-lee in #3723
- feat(DataHubGraph): Adding utilities methods to DataHubGraph class. by @varunbharill in #3729
- fix(superset): handle dashboards without charts (#3713) by @grumbler in #3714
New Contributors
- @lvicentesanchez made their first contribution in #3702
- @grumbler made their first contribution in #3714
Full Changelog: v0.8.18...v0.8.19
v0.8.18
DataHub Release 0.8.18 is here!
Release Highlights
-
Metadata Service Authentication: Make authenticated requests to the Metadata Service APIs (GraphQL + Rest.li)
-
Redshift Lineage: Out-of-the-box support for ingesting Dataset->Dataset lineage from Redshift system tables. Includes Tables, Views, and COPY from S3
-
Apache Nifi Connector (Beta) : Integration with Apache Nifi to extract DataJobs and DataFlows! Read the source docs here. This source is currently incubating in beta.
-
Mode Connector (Beta): Integration with Mode Analytics to extract reports, charts, and more! Read the source docs here. This source is currently incubating in beta.
-
Add Aspects without a fork: This is a major milestone towards No-Code UI
- Watch the No Code UI Sneak Peek
-
Glossary Term Transformer: Allows users to add tags or glossary terms to entities based on a regex match filter (Shoutout to Community Member ecooklin!)
-
Bug Fixes:
- [metadata service] Empty search query fails to resolve
- [metadata service] Log4j vulnerability addressed!! Highly recommend folks to upgrade to latest.
- [metadata ingestion] [bigquery] Fix handling of partitioned & snapshotted tables for lineage usage, and basic table indexing.
- [metadata-service] [recommendations] Fix issue where recently viewed and most popular recommendations were not showing up when user urn contains special chars.
- [metadata ingestion] Add config to specify ca certificate path for datahub-rest sink
- [metadata ingestion][snowflake] Handling for special characters in snowflake databases and schemas.
- [ui] Fix Groups page not showing asset ownership correctly
- [ui] Fix issue where markdown links were not clickable.
- [metadata service] Improve search & recommendations performance by ~50%, homepage load by ~50%.
- [cli] Fix deletes by search cannot accept auth token
- [metadata service][policies] Fix invalid Tag creation policy
- [metadata service][upgrade] Fix Spring injection of Entity Client inside datahub-upgrade
Backwards Incompatible Changes
- The standalone Spring GraphQL Service has been removed. (Replaced in full by Metadata Service GraphQL API)
New Contributors
- @robscriva made their first contribution in #3600
- @adriangb made their first contribution in #3582
- @bartlomiejolma made their first contribution in #3650
- @anshbansal made their first contribution in #3653
- @ecooklin made their first contribution in #3657
What's Changed
- style(react-app): add default monospace font to font-family by @robscriva in #3600
- feat(boot): Ingest datahub root user info on boot by @jjoyce0510 in #3603
- [refactor] - Remove GMS GraphQL Service by @arunvasudevan in #3605
- feat(auth): Metadata Service Authentication! by @jjoyce0510 in #3598
- docs:remove hubspot form and instead link to acryldata.io by @jeffmerrick in #3488
- fix(docs): Move transformers to be under metadata ingestion by @aseembansal-gogo in #3591
- fix(bigquery-usage): Fix filters and event joining logic. by @varunbharill in #3610
- feat(cli): adding a put command and docs by @swaroopjagadish in #3614
- feat(elastic): adding es logo by @gabe-lyons in #3611
- feat(profiler): dynamically combine queries by @hsheth2 in #3572
- doc(components): Adding DataHub components overview by @jjoyce0510 in #3606
- fix(java client): Fix Profiling NPE + misc improvements by @jjoyce0510 in #3621
- fix(docs-website): fix incorrect managed url by @jeffmerrick in #3618
- fix(ingest): rectify platform urn in kafka connect source by @mayurinehate in #3624
- docs(okta): Added Okta Logout Settings by @serefacet in #3627
- fix(search): Fix issue when query is empty by @dexter-mh-lee in #3620
- fix(redshift-usage): Add docs for redshift usage ingestion. by @varunbharill in #3617
- fix(ci): pin great expectations version by @swaroopjagadish in #3629
- fix(delete): Remove logic that adds an invalid filter for platform field by @dexter-mh-lee in #3619
- feat(metadata-service): support for custom model extensions without forks by @shirshanka in #3630
- fix(kafka-producer): fix debug logging by @claudio-benfatto in #3626
- fix(tests): fix typo in test name by @adriangb in #3582
- feat(cfg): Add configurable GCP log page size by @jjoyce0510 in #3556
- fix(recommendations): Fix issue with recently viewed and most popular recs not showing up by @dexter-mh-lee in #3631
- fix(ingestion): Add config to specify ca certificate path for datahub-rest sink by @dexter-mh-lee in #3632
- fix(ingest): workaround great-expectations compatibility issue by @hsheth2 in #3634
- fix(ingestion): Handling for special characters in snowflake databases and schemas. by @rslanka in #3635
- fix(group ownership): Fixing Groups Profile ownership by @jjoyce0510 in #3638
- feat(autorender): Auto render aspects that don't have frontend components in the UI by @gabe-lyons in #3597
- docs(business glossary): document the business glossary file format by @gabe-lyons in #3639
- fix(ingestion): Enhance supported and unsupported base_objects_accessed for Snowflake Usage by @rslanka in #3608
- feat(quickstart): Simplify docker generate and compare script by @EnricoMi in #3434
- fix(docs): small fixes to docs and docker images for custom metadata … by @swaroopjagadish in #3640
- fix(mongodb): enable version check for document size filter. by @varunbharill in #3644
- docs: Update to DataHub Adopter logos & Townhall details by @maggiehays in #3648
- feat(build): adds support for incremental build in ingestion by @swaroopjagadish in #3647
- fix(description): fix issue where markdown links are unclickable by @gabe-lyons in #3646
- fix(schema): fix bug where key/value toggle would appear on schema tabs with no fields by @gabe-lyons in #3643
- feat(build): Preflight script for metadata ingestion setup on m1 by @treff7es in #3652
- docs(graphql) Adding additional GraphQL docs by @jjoyce0510 in #3649
- docs: correct title of postgres gms by @bartlomiejolma in #3650
- fix(cli): fix for deletion cli by @anshbansal in #3653
- fix(metadata-io) Adds docker engine configuration checks before running docker-based tests by @pedro93 in #3654
- fix(model): Remove unused PDL from pre-nocode days by @dexter-mh-lee in #3659
- fix(docs): fix docs build on m1 by @anshbansal in #3662
- feat(ingest): add --strict-warnings option by @hsheth2 in #3665
- fix(search): Improve search and recs performance by @dexter-mh-lee in #3660
- feat(metadata-model): adding metadata model doc generation and upload… by @swaroopjagadish in #3667
- fix(ingestion): black formatting by @hsheth2 in #3676
- fix(metadata-ingestion): fix requirements for m1 preflight checks by @gabe-lyons in #3677
- fix(kafka): Add back changes to centralize kafka config by @dexter-mh-lee in #3675
- feat(ingestion): anonymous usage stats by @kevinhu in #3668
- docs(scheduling): re-arrange docs related to scheduling, lineage, CLI by @anshbansal in #3669
- feat(delete): support deleting by searc...
v0.8.17
Notable Changes
- Added Recommendations and redesigned the home page!
- Modular way to add recommendations throughout the application
- Recommendation modules for top platforms, recently viewed, popular entities, top tags/terms were added to home page
- Search page also has top tags/terms module on the bottom
- Ingestion Sources
- DBT enhancements
- Creating dbt platform entities to capture dbt node types such as models, tests, source, seed, etc. linking dbt entities with other dbt or underlying platform entities.
- OpenAPI specs
- Kafka Connect (Regex based transformers, BigQuery sink)
- Trino Usage (Starburst)
- DBT enhancements
- Improved lineage viz performance and lineage viz UX
- Improved layout logic
- Nodes can be dragged and dropped
- Fixes for delete API not always deleting all of an entities data
- Improved documentation for adding a custom Metadata Ingestion Source
- Fixes description rendering for Charts, Dashboards, Flows, Jobs
- Add YAML configuration file for Metadata Service
- Filter search results by Sub-Type (Looker Explore, View, etc)
- Support proxying DataHub Frontend requests to Metadata Service at
/api/gms
- Multi-platform (x86, arm64) support for Docker images (Apple M1 support)
- Graph Service: DGraph support (phase 1)
What's Changed
- fix(docs): fix image paths and company logo link by @jeffmerrick in #3435
- feat(docs-site): two small tweaks by @gabe-lyons in #3437
- feat(ingestion): support custom properties to be ingested via business glossary yaml by @gabe-lyons in #3438
- fix(restli entity client): fix case where sortCriterion is null by @gabe-lyons in #3436
- feat(lineage): improved lineage performance + simplified layout logic + some easter eggs by @gabe-lyons in #3357
- docs(metamodel): added DataHub's metadata model diagram by @swaroopjagadish in #3449
- fix(tag+terms): improved error messaging & rules on tag + term mutations by @gabe-lyons in #3448
- fix(browse): disable breadcrumb links on non-browsable entities by @gabe-lyons in #3447
- fix(ingest): fix lookml derived tables parsing by @remisalmon in #3443
- docs(docs-site): small nits for docs site homepage by @gabe-lyons in #3444
- perf(ingest): lazy load ingestion plugins by @hsheth2 in #3430
- Fix docs website by @jeffmerrick in #3446
- fix(restore): Fix restore backup jobs by @dexter-mh-lee in #3445
- fix(ingest): lineage for Airflow subdags by @kevinhu in #3351
- docs: Update to Q3 2021 accomplishments by @maggiehays in #3420
- fix(bigquery): Add gcp logging dependency for bigquery source. by @varunbharill in #3451
- build(frontend): unzip depend on yarnBuild by @gabe-lyons in #3452
- feat(react): add handy webpack analyze command by @gabe-lyons in #3454
- test(CI): show test results on GitHub by @EnricoMi in #3362
- docs(transformers): add exemple of custom tag function by @WaStCo in #3354
- docs: add guide for using custom sources by @DSchmidtDev in #3324
- feat(dbt-ingestion): added possibility to skip specific models by @AndreasTA-AW in #3340
- fix(mongodb): Support filtering mongodb documents as per size. by @varunbharill in #3456
- fix(mysql): Update default mysql collation to utf8mb4_bin by @jjoyce0510 in #3459
- fix(ingestion): Workaround for Python 3.8/3.9 mypy invalid syntax issue with airflow 2.2.0 by @rslanka in #3460
- fix(ui): Fixing UI User + Group display name by @jjoyce0510 in #3461
- fix(react): fix up
yarn test
error reporting by @gabe-lyons in #3462 - docs(frontend): remove confusing suggestion to manually create users by @gabe-lyons in #3465
- docs: Overhaul of DataHub Features page by @maggiehays in #3439
- docs: Update TownHall Agenda and TownHall History by @maggiehays in #3463
- fix(tags): fix links to tags when there are special chars in the urls by @gabe-lyons in #3464
- fix(CI): Stabalize gradle build by @EnricoMi in #3413
- docs: update next Townhall date in README.md by @maggiehays in #3466
- perf(react bundle): decrease bundle size by 15% by @gabe-lyons in #3468
- fix(graphql): fixing Graphql engine factory when analytics are disabled by @gabe-lyons in #3467
- feat(recommendations): Recommendations infra P1 by @jjoyce0510 in #3455
- refactor(styling): Improving recommendation Tag / Search query list styling by @jjoyce0510 in #3472
- fix(docs): fix transformer doc example by @aseembansal-gogo in #3469
- fix(ingest): redshift source gets external table types properly by @treff7es in #3371
- fix(recs): Remove removed entities from aggregation by @dexter-mh-lee in #3473
- fix(ui): fix double formatting of entity count on home page by @jjoyce0510 in #3474
- fix(subtypes): fix case where subtypes are not being fetched for leaf datasets by @gabe-lyons in #3476
- feat(ingestion): User configurable dataset profiling. by @rslanka in #3453
- styling(ui): improve tag list, glossary term list recommendation styling by @jjoyce0510 in #3475
- feat(ui): Provide filtering capability for Sub Types inside the UI by @jjoyce0510 in #3479
- fix(ingest): correctly support multiple snowflake databases by @hsheth2 in #3482
- fix(datajobs): fetch dataflow properties from a relationship by @gabe-lyons in #3487
- fix(fk): fix schemaField urn construction in foreign keys by @gabe-lyons in #3486
- fix(fk): trim whitespace from fk constraints in the case the fieldspec has leading or trailing whitespace characters by @gabe-lyons in #3485
- feat(dbt): add dbt logo and platform. by @varunbharill in #3483
- feat(lineage): some ux improvements to lineage interactions by @gabe-lyons in #3478
- refactor(nocode): Final part of No-Code cleanup by @jjoyce0510 in #3477
- fix(browse paths): Adjust Default browse path logic for datasets by @jjoyce0510 in #3495
- fix(lineage backend): fix ownership timestamps by @gabe-lyons in #3498
- tests(smoke): introducing first isolated smoke test: updating tags & terms by @gabe-lyons in #3496
- feat(graphql): extend entity client to support aspect methods directly via java by @gabe-lyons in #3489
- fix(aspects): fix null aspects case by @gabe-lyons in #3501
- Docs: Update to Slack & Townhall details by @maggiehays in #3502
- refactor(profiler): add PerfTimer class and fix typos by @hsheth2 in #3497
- fix tiny typo by @andrewm4894 in #3484
- fix(ingestion): Glue job names by @kevinhu in #3503
- fix(fk): fix foreign key styling with modals by @gabe-lyons in #3500
- docs: add path fix for 'command not found' by @dannylee8 in #3490
- docs: nit, grammar by @dannylee8 in #3491
- docs: nit by @dannylee8 in #3492
- Docs: nits by @dannylee8 in #3493
- add tooltip for owner category in dataset profile page by @saxo-lalrishav in #3470
- feat(ingest) : kafka connect source improvements by @mayurinehate in #3481
- feat(ingest): adding support for read-modify-write capabilities durin… by @swaroopjagadish in #3506
- feat(dbt): Dbt enhancements - dbt nodes, lineage, subtype, etc. by @varunbharill in #3519
- docs (Metadata Model): nits by @dannylee8 in #3525
- fix(ingestion): Enhance logging and error-handling in bigquery usage connector. by @rslanka in https://github.com/linkedin/datahub/pul...