Hi! My name is Ivan Muhammad Siegfried. I took my Master Degree in Physics and graduated from Institut Teknologi Bandung in 2019 and took my Bachelor in Physics from Padjadjaran University in 2016. I have strong interest in Computational Physics especially on Density Functional Theory, Computational Fluid Dynamics, Heat Transfer, and Instrumentation as well. Now, I focus on Data Science, Machine Learning, Computer Vision. I am currently working in private sector focusing on developing applied technologies that can improve human life.
These are programming languages and tools that I mastered.I often use Python as main programming language for Data Science, Machine Learning, Computer Vision etc. The other languages are used for me to solving Computational Physics problem.
This project aims to digest tweets data using Twitter API. The raw data is then sorted based on language: Spanish and not Spanish Tweets Data.
PDF LinkThis project aims to split the data into multiple *.xlsx files based on certain criteria which can be further processed by a data scientist.
PDF LinkThis program aims to explore information from the Movie Dataset using Apache Pig / Tez found on the Hortonworks Data Platform.
GitHub LinkThis dashboard aims to find insight about Covid-19 Data on the US. I display death per capita, new cases in linear and logaritmic trend, new deaths with cumulative cases, and also growth cases with 7-day moving average cases. Also, I provide the table with respect to the states.
Google Data Studio LinkThis code is used as an example to Extract, Transform, and Load (ETL) using PySpark.
GitHub LinkThe use of Python scripts to produce data for Kafka Producer using a fake pizza-based dataset to then be pushed into the Kafka Topic.
Medium LinkOne way to store data in a non-relational form such as tweets from Twitter is to store it with non-relational databases such as MongoDB or Cassandra. On this occasion, the author will demonstrate how data storage Apache Kafka receives and then saves to MongoDB using the help of the pymongo package.
Medium LinkThis program is used to find the right model for the stroke healthcare dataset.
Kaggle LinkThis program is used to find the right model for the Telecomunication dataset.
Kaggle LinkThis program is used to find the right model for the bankruptcy model.
Kaggle LinkThis project is a project that aims to determine whether someone can be given a loan or not. The program begins with the process of checking null data in the dataset, the process of replacing the null values with new values, checking for duplicate data. Then, to improve the prediction performance, the use of label encoding technique is used. Then by using a variety of models (Logistic Regression, Random Forest Classification, XGBoost), XGBoost is found to be the best model with an accuracy level of up to 80%.
Kaggle Link MediumThis program is a program that aims to determine house prices with several parameters using linear regression. This program is preceded by data cleansing. After cleaning, the data is then examined for the correlation between one parameter and another using the Pearson method. After determining which parameters will be predicted, the data is then trained using linear regression, Lasso, ElasticNet, K-Neighbors-Regressor, Gradient Boosting Regressor. The models are then validated to see the accuracy of the models and the resulting predictions.
Kaggle LinkThis program is a program that aims to determine house prices with several parameters using linear regression. This program is preceded by data cleansing. After cleaning, the data is then examined for the correlation between one parameter and another using the Pearson method. After determining which parameters will be predicted, the data is then trained using linear regression, Lasso, ElasticNet, K-Neighbors-Regressor, Gradient Boosting Regressor. The models are then validated to see the accuracy of the models and the resulting predictions.
Kaggle LinkThis program is an image recognition program using a convolutional neural network. Image detects an edge using sets of mathematical methods such as 2D convolution, Max Pooling collected in a sequential. Then, this form of training data is optimized using the Poisson function as a loss function. Then, the sequential and loss functions are compiled and implemented into the data set we have. The model is then evaluated by calculating the level of accuracy.
Kaggle LinkThis model detects the use of a face image wearing a mask using an artificial neural network architecture using ResNet50V2, MobileNetV2, and Xception which are models that have been trained in the early stages. To specify the model in a specific purpose, it is necessary to carry out a process called hyper-tuning in which the model is trained to recognize facial images using masks or not. The results obtained are the validation values for the ResNet50V2 and Xception methods have better accuracy values compared to MobileNetV2. Even so, there is a trade-off where MobileNetV2 has a faster training speed than the other two methods.
Preprint LinkThis program analyzes the sentiment related to the conversation on the Twitter timeline when the investment case scandal was being carried out and masterminded by Jouska. From these results, it can be seen that the timeline considers this issue as a negative issue.
GitHub LinkThis program analyzes sentiment related to conversations on the Twitter timeline about the performance of the Minister of Health in July 2020. From the figure above, it can be seen that sentiment regarding Minister Terawan is in a positive position.
GitHub LinkThis program predicts a person's salary based on parameters such as working class, age, education, occupation, etc. This program includes data wrangling, modeling, and validation processes. This model managed to occupy the 15th position out of a total of 420 people who participated in this competition.
GitHub LinkThis program is a program created to complete courses in Dicoding Indonesia. This program models the hand symbols on which the model is trained using 1314 total training data and 874 test data. The data is then forwarded into a sequential model consisting of layers of artificial neural networks. This network takes the form of a mathematical operation used to detect edges. The resulting accuracy rate for this model is at 97.54%
GitHub LinkThis program is used to classify tweets into 5 classifications: Extremely Negative, Negative, Neutral, Positive, and Extremely Positive. This code is useful for understanding the public's perception of a particular event.
Kaggle LinkIn general, there are two frameworks that are commonly used by data scientists to gather information and create models from raw data. Commonly used methods include the Cross-industry Standard Process for Data Mining (CRISP-DM) and Obtain, Scrub, Explore, Model, and Interpret (OSEMN) Framework. I will explain the use of CRISP-DM and its use directly in the program that has been created.
Medium LinkThe Naïve Bayes Classifier uses the probability method as the statistical basis it uses. For example, we have frame data regarding a text whether it is talking about sports or not.
Medium LinkRidge Regression is a regular form of linear regression. The cost function or commonly known as the loss function is a function used to find parameters in regression. This parameter can be taken by minimizing it so that the parameter θ is obtained. This is used as a statistical basis for the Ridge Regression method.
Medium LinkIn general, the classification method of logistic regression is to create a probability boundary. If there is a value that exceeds this probability, the model will assume that the value is in the positive class (classified to 1). However, if a value is less than this probability, then the value is in the negative class (or we classify it to a value of 0).
Wordpress LinkOne of the tools that are relied upon and used to perform classification is AdaBoost. AdaBoost is one of the oldest tools available using the boosting method. One of the algorithms used is to combine various insignificant classifiers into one very strong classifier.
Wordpress LinkLinear regression is a basic statistical tool for predictive analysis.
Wordpress LinkThis program was created to calculate the decay time of Uranium 235. In addition, it calculates the engine rounding value as well as the correct grating width for this simulation.
GitHub LinkThis program is made to calculate the speed of a racing athlete if the power value is known. The general formulation that we can then discretise into a form that is computationally easy to solve so that the speed is obtained according to the defined constraints along with the parameters.
GitHub LinkThis program is designed to see the effect of several numerical methods (Euler, Euler-Cromer, and Verlet) to calculate a pendulum motion by taking into account the accuracy of the program.
GitHub LinkThis program is designed to simulate the Governing Equation for E and H (1 Dimension) and calculate Hx, Hy, Dz, and Ez (3 Dimension).
GitHub LinkThis program is designed to solve various problems using Monte-Carlo and Random Walks. One of the problems solved using Monte Carlo is calculating the value of Pi. Meanwhile, the problem that is solved using the Random Walk is finding a solution to an equation.
GitHub LinkThis program is designed to calculate a movement of electrons in 1-dimensional NaCl crystals using the help of the Lennard-Jones potential using Matlab.
GitHub LinkThis program is designed to calculate the ribbon structure of various materials using the Julia programming. The results of this program are then compared with similar programs, namely Abinit and VaSP.
Link