第一回

IT is for storing, retreaving and

what you need to know when developing web software.

hardware
operating system
software applications and development
database
networking
internet & web technologies
security

data vs information

data is just a objective, raw type of bit which is just objectively observed.

Information is knowledge gaind by analizing data.

ビジネスで使われるITの一連のサイクルをこの授業でやろうって話だな。

データ収集　→　モデル構築　→　モデルのデプロイ　っていう、まあよくやるやつや。まあ、これを達成するために、上の3,4,6をやります。データベース、ソフトウェア開発、インターネット。

成績評価

10% attendance 45% homework,assignments 45% final exam.

We can get all the correct answer for the assignments on 25th of July, and the final exam will mostly about coding. Like visualization, handling data using pandas and numpy , and SQL.

第一回はbasic command line

Basic command line

役立ちそうなものだけ。

uname -a : Show system and kernel
head -nl /etc/issue : Show distribution
ctrl + l = clear (これは便利かもしれない！！)

git

staging area からunstageする方法 git restore ファイル名

ブランチの作り方 (ブランチを作って、そのブランチにコミットしていって、最後にmainにマージする感じだね。基本的な流れとしては)

git switch -C new_branch : 新しいブランチを作る操作

git switch main : mainブランチに戻る

git merge –no–ff new_branch

git merge merge_source_branch

今いるbranchにmerge_source_branchからの変更をマージする。

第二回

先生曰く、 PythonとSQLはデータアナリシスにとって最も重要らしいです。へー、そうなんだ、まあ、競馬AIを作るところでも結構やったしね。そうね、データ科学トレーニングキャンプの1とかも、かなり役に立ったイメージだね。会社に入って配属されるチームは、データを集めるところと、データパイプラインを設計するところだと思うんだけど、まあ、自分で解析もできたほうが絶対にいいっていうことで、SQLとPythonも完璧にしたいんですよね。まあ、頑張りましょうってことです。

データベースについて

NoSQLっていうのは、Not Only SQLの略出会って、Not SQLではないあってこと。これ大事な気がする。

ER図について

Entityがあり、abributeがあり、Relationがあるって話だな。まあ簡単だ。

関係データベースがやはり基本

データベース設計のやり方は忘れないようにしてくださいねー、という話です。

SQL ()

DDL data definition

sqliteを使ってでもやりました！！

sqliteは、ライブラリとして動くデータベースだからすごいんですよね。mysqlとかpostgresはクライアントとして動いて、サーバに接続するタイプなんですけど、sqliteはファイル何ですよ。まあ、mysqlとpostgresはサーバがファイルを管理しているだけで実質同じなんですけどね。sqlite3ですからねー、新しいのは。

Joinの使い方がやはりまだ怪しいのです！！

ここを完璧にしたいんですよー。Joinは頑張ってマスターしましょう。Left join と　right joinね。

まずはleft join

letf joinは左のテーブルに右のテーブルを接続させる。どのカラムで接続させるかも決めないといけない。

こんな感じです。

SELECT * 
FROM 
    dept_manager as dm 
    LEFT JOIN
    departments as dp
    on dm.dept_no = dp.dept_no

さらにjoinを重ねることもできる。

SELECT * 
FROM 
    dept_manager as dm 
    LEFT JOIN
    departments as dp
    on dm.dept_no = dp.dept_no
    LEFT JOIN
    employees as e
    on dm.emp_no = e.emp_no

次にinnter join

SELECT * 
FROM 
    dept_manager as dm
    INNER JOIN 
    employees as e
    ON dm.emp_no = e.emp_no

ちなみに、

When you write just "JOIN" without specifying any prefixes like "INNER" or "LEFT," it is assumed that you are using an inner join. An inner join returns only the rows that have matching values in both the tables being joined based on the specified join condition.

inner joinと outer join(left joinとright join)の違い

ということなんだけど、これは簡単で、結合の条件を書くじゃないで、条件に合致しなかった行の扱い方がinnerとouterで違うわけなんです。

chatGPTに聞くとこんな感じです。

The main difference between an inner join and an outer join lies in how they handle unmatched rows between the tables being joined. Here's a breakdown of each:

Inner Join: An inner join returns only the rows that have matching values in both tables based on the specified join condition. It combines the rows from both tables where the join condition is satisfied, excluding any unmatched rows.

Example:
Let's say we have two tables, Table1 and Table2, with a common column called "id." An inner join between the two tables will return only the rows where the "id" values match in both tables.

sql

SELECT *
FROM Table1
INNER JOIN Table2 ON Table1.id = Table2.id;

Left Outer Join (or Left Join): A left outer join returns all the rows from the left (or first) table and the matching rows from the right (or second) table based on the join condition. If there are no matches in the right table, NULL values are used for the columns of the right table in the result set.

Example:
Using the same tables as before, a left outer join will return all the rows from Table1 and the matching rows from Table2 based on the "id" column. If there are unmatched rows in Table2, the corresponding columns will contain NULL values in the result set.

sql

SELECT *
FROM Table1
LEFT JOIN Table2 ON Table1.id = Table2.id;

Right Outer Join (or Right Join): A right outer join is similar to a left outer join, but it returns all the rows from the right table and the matching rows from the left table. Unmatched rows from the left table will have NULL values for the columns of the left table in the result set.

Example:
Continuing with the previous tables, a right outer join will return all the rows from Table2 and the matching rows from Table1 based on the "id" column. Unmatched rows from Table1 will have NULL values in the corresponding columns.

sql

SELECT *
FROM Table1
RIGHT JOIN Table2 ON Table1.id = Table2.id;

Full Outer Join: A full outer join returns all the rows from both tables, including both the matched and unmatched rows. If there is no match for a row in the other table, NULL values are used for the columns of the table that lacks a match.

Example:
Using Table1 and Table2, a full outer join will return all the rows from both tables. If there are unmatched rows in either table, the corresponding columns will contain NULL values in the result set.

sql

    SELECT *
    FROM Table1
    FULL JOIN Table2 ON Table1.id = Table2.id;

It's worth noting that the specific syntax for joins may vary slightly depending on the database management system (DBMS) you are using.

内部結合は、一致する行のみを返し、外部結合は一致しない行も含めて結果を返します。なんだね！！勉強になりました！！

第三回

pythonを使ってデータベースとinteractする方法を教えてくれる。これはなかなかいいね。

’’' import sqlite3 import pandas as pd

create a new database connecion

db_con = sqlite3.connect(‘myCompany.db)

create cursor object

cursor = db_con.cursor()

まあ、データの登録はpythonを使ってはやらないかな？どうだろう、そんなこともないのかね？まあ書いておいて損はないでしょう。

cursor.execute(‘‘‘CREATE TABLE IF NOT EXISTS PRODUCT( product_ID INTEGER PRIMARY KEY, product_name CHAR(20) NOT NULL, unit_price FLOAT )’’’)

cursor.execute(‘‘‘INSERT INTO PRODUCT( product_ID, product_name, unit_price) VALUES (1001, ‘ds’, 100)’’’)

cursor.execute(‘‘‘INSERT INTO PRODUCT( product_ID, product_name, unit_price) VALUES (1002, ‘game_boy’, 100)’’’)

cursor.execute(‘‘‘INSERT INTO PRODUCT( product_ID, product_name, unit_price) VALUES (1003, ‘wii’, 100)’’’)

cursor.execute(‘‘‘INSERT INTO PRODUCT( product_ID, product_name, unit_price) VALUES (1004, ‘switch’, 100)’’’)

cursor.execute(‘‘‘INSERT INTO PRODUCT( product_ID, product_name, unit_price) VALUES (1005, ‘mario’, 100)’’’)

dont forget to commit

db_con.commit()

’’’

で、pythonからdbにつないで、pdに流す方法


def get_data(cursor):
  all_rows = cursor.fetchall()
  #[all_rows.append(np.array(row)) for row in cursor]
  column_names = [description[0] for description in cursor.description]
  df = pd.DataFrame(data=all_rows, columns = column_names)
  # hide index
  blankIndex=[''] * len(df)
  df.index=blankIndex
  return df


db_con = sqlite3.connect('myCompany.db')
cursor = db_con.cursor()
query = 'SELECT * FROM PRODUCT LIMIT 10'
cur = cursor.execute(query)


get_data(cur)

#もしくは

db_con = sqlite3.connect('myCompany.db')

def run_query(query):
  return pd.read_sql_query(query,db_con)


query = 'SELECT * FROM PRODUCT'
run_query(query)


# pysparkを使う子もできるみたいです！！高速化してくれるみたいです！！はい！！

詳しくは自分で頑張って調べてみてって話でした。

後半は、visualizationの話

matplotlibとseabornの話。これは、データキャンプ1でもやったけどね。まあ、復習もかねてやればいいのではないでしょうか。この辺は慣れです。もし今後データ系に進むのであれば、絶対に習得しておきたいところ。まあ、そっち家に進むことがあるのかは微妙ですが。。。競馬のデータ解析とかはしたいですね。頑張って。。。

第4回

pandasとnumpyの基本です。これもデータキャンプ２でやったが、まあできるとかなりいいね。SQLと一緒にマスターしたい。

numpyとpanasの基本

google_colab

small_movieDB.db (sqlite3のデータ)をpythonで読み込んで、解析する方法について

google_colab これは便利だねー、てか、mysql都下のデータを読み込みたいときはどうすればいいのですかね？気になりますが。

chatGPT先生に聞いてみたあー、今回やってる方法とそんなに変わらないみたいですね。簡単です。

pip install pandas mysql-connector-python

import pandas as pd
import mysql.connector

# Replace the placeholders with your MySQL connection details
host = 'localhost'
user = 'your_username'
password = 'your_password'
database = 'your_database_name'

# Create a connection object
conn = mysql.connector.connect(host=host, user=user, password=password, database=database)

# Specify your SQL query to retrieve the desired data
query = "SELECT * FROM your_table_name"

# Use the connection and query to fetch data into a DataFrame
df = pd.read_sql(query, conn)

conn.close()

欠損データの扱いについて

扱い方には二種類あると平均値で埋める、もしくは、省く (drop)　する。

json_formatの扱い方

これは、実戦でめちゃめちゃ使うので、マスターしてほしいです。

メモ

Serieseがたは、dictionaryがたから作れる。ちなみに、 df[“Population”] これは、Seriese型になる ilocは、integer locationを表す。

第一回#

what you need to know when developing web software.#

data vs information#

ビジネスで使われるITの一連のサイクルをこの授業でやろうって話だな。#

成績評価#

第一回はbasic command line#

Basic command line#

git#

第二回#

データベースについて#

ER図について#

関係データベースがやはり基本#

SQL ()#

sqliteを使ってでもやりました！！#

Joinの使い方がやはりまだ怪しいのです！！#

まずはleft join#

次にinnter join#

inner joinと outer join(left joinとright join)の違い#

第三回#

create a new database connecion#

create cursor object#

まあ、データの登録はpythonを使ってはやらないかな？どうだろう、そんなこともないのかね？まあ書いておいて損はないでしょう。#

dont forget to commit#

で、pythonからdbにつないで、pdに流す方法#

詳しくは自分で頑張って調べてみてって話でした。#

後半は、visualizationの話#

第4回#

numpyとpanasの基本#

small_movieDB.db (sqlite3のデータ)をpythonで読み込んで、解析する方法について#

欠損データの扱いについて#

json_formatの扱い方#

メモ#

第一回