Data Analyst
Data Analysts turn data into information that can then turn into insights and decisions. Visualizing data becomes a critical need to achieve this.
The platform is built to be as self service, providing analysts with a wide array of tools to explore, process and visualize data.
(1) Charting libraries enable all data tables in the platform to be easily visualized. The platform automatically understands different data types and pre-processes them into measures and dimensions, making it easy to drag and drop these columns to required chart type and visualize data.
(2) Before analysts can work on structured data, it needs to go through a process of connecting to data sources, ETL pipelines and defined data schema then lands at a Data Warehouse to be analyzed. The platform automates and executes a lot of these steps. An Analyst just needs to connect the data sources, and the platform automatically identifies relationships and with a simple search and selection of columns, the Analyst can create a schema like a data warehouse and start analyzing the data.
(3) The platform enables analysts to design and build their dashboards. Automation helps dashboards to be live and up to date. Transactional data and real-time data streams can also be visualized on these dashboards, empowering organizations to make faster business decisions.
Data Engineer
Data engineers bring data from multiple sources, performing ETL and making it accessible to the data team by providing the data in a structured form.
The platform automates a lot of the redundant or repetitive tasks done by data engineers:
(1) The platform helps in data discovery to search and identify how and where their data is stored, and provides details such as column names and data types. It also identifies potential primary and foreign key relationships between data sources, to understand how data is structured.
(2) The platform has pre-built connectors for the basic databases such as MySQL, MongDB, … It keeps updating new connectors as needed.
(3) Quickly set up pipelines by configuring data sources, operators and destinations, with drag & drop ETL. Pre-built operators like filters, joins, math functions, null value replacements etc. can be applied to data sources and then stored in specified destinations. Schedule pipelines to run at required frequencies or run them manually. Data engineers can also choose between serverless or Spark to execute these pipelines.
(4) The platform provides flexibility to bring in your own scripts and run it on the code engine block. Your scripts can be imported to run on the platform without any additional environment set-up. Just select your language, code dependencies and upload your script and seamlessly run it on your specified infrastructure. The code engine supports versioning and collaboration as well.
Data Scientist
Data Scientists generate insights from data that are not visible by just analyzing it. They explore data and connect the dots in unique ways to help businesses truly unlock the potential in their data, for insights.
The platform is built keeping in mind the entire life cycle of data science initiatives.
(1) The platform’s feature engineering allows data scientists to easily search for the data they are looking for, bring it together, transform it and enrich the data. Drag and drop interface makes it intuitive to select any of the pre-built operators and custom code engine blocks to prepare the data for machine learning.
(2) Data scientists can visualize data using built-in charting libraries. All data sources in platform provide data scientists with statistics such as row and column count, min-max values, mean and averages and several other information helping them make better data preparation and feature engineering decisions.
(3) Multiple common machine learning models utilized by data scientists are available on the platform through libraries. Data scientists need to select the features, select the models they want to train, input the model parameters and run the pipeline. The platform can execute the training of these models and provide metrics on the evaluation of these models like error rates, f-scores, AUC and ROC curves, confusion matrix etc.
(4) Data scientists usually rely on IT teams to set up their infrastructure and help them deploy their models into production. The platform’s MLOps features helps data scientists themselves to deploy multiple models into production and generate REST APIs for consumption by other applications. Real-time scoring capabilities and model drift monitoring help data scientists experiment and run their models in real-life scenarios rather than just being POCs restricted to test data.